Abstract
We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves long-range phasing paths extracted from reads, while producing a parsimonious set of transcripts and minimizing coverage deviation. On 10 human RNA-seq samples, Scallop produces 34.5% and 36.3% more correct multi-exon transcripts than StringTie and TransComb, and respectively identifies 67.5% and 52.3% more lowly expressed transcripts. Scallop achieves higher sensitivity and precision than previous approaches over a wide range of coverage thresholds.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Meth. 5, 621–628 (2008).
Lister, R. et al. Cell 133, 523–536 (2008).
Wang, Z., Gerstein, M. & Snyder, M. Nat. Rev. Genet. 10, 57–63 (2009).
Pickrell, J.K. et al. Nature 464, 768–772 (2010).
Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).
Guttman, M. et al. Nat. Biotechnol. 28, 503–510 (2010).
Li, W., Feng, J. & Jiang, T. J. Comput. Biol. 18, 1693–1707 (2011).
Lin, Y.-Y. et al. in Proc. 12th Workshop Algs. in Bioinf. (WABI'12), vol. 7534 of Lecture Notes in Comp. Sci. 178–189 (2012).
Song, L. & Florea, L. BMC Bioinformatics 14, S14 (2013).
Neff, K.L. et al. BMC Bioinformatics 14, 1 (2013).
Maretty, L., Sibbesen, J.A. & Krogh, A. Genome Biol. 15, 1 (2014).
Canzar, S., Andreotti, S., Weese, D., Reinert, K. & Klau, G.W. Genome Biol. 17, 16 (2016).
Pertea, M. et al. Nat. Biotechnol. 33, 290–295 (2015).
Liu, J., Yu, T., Jiang, T. & Li, G. Genome Biol. 17, 213 (2016).
Hayer, K.E., Pizarro, A., Lahens, N.F., Hogenesch, J.B. & Grant, G.R. Bioinformatics 31, 3938–3945 (2015).
Kim, D. et al. Genome Biol. 14, R36 (2013).
Dobin, A. et al. Bioinformatics 29, 15–21 (2013).
Kim, D., Langmead, B. & Salzberg, S.L. Nat. Methods 12, 357–360 (2015).
Patro, R., Duggal, G., Love, M.I., Irizarry, R.A. & Kingsford, C. Nat. Methods 14, 417–419 (2017).
Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Nat. Biotechnol. 34, 525–527 (2016).
Vatinlen, B., Chauvet, F., Chrétienne, P. & Mahey, P. Eur. J. Oper. Res. 185, 1390–1401 (2008).
Shao, M. & Kingsford, C. Preprint at bioRxiv https://www.biorxiv.org/content/early/2016/11/16/087759.
Acknowledgements
We thank Cong Ma and Juntao Liu for helpful suggestions and discussions. This research is funded in part by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4554 to C.K., by The Shurl and Kay Curci Foundation, by the US National Science Foundation (CCF-1256087, CCF-1319998), and by the US National Institutes of Health (R01HG007104 and R01GM122935).
Author information
Authors and Affiliations
Contributions
M.S. and C.K. designed the method, and M.S. implemented it. M.S. and C.K. designed the experiments, and M.S. conducted them. M.S. and C.K. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–21, Supplementary Tables 1–3, Supplementary Notes 1–7 (PDF 2473 kb)
Supplementary Code
Source Code of Scallop (ZIP 178 kb)
Rights and permissions
About this article
Cite this article
Shao, M., Kingsford, C. Accurate assembly of transcripts through phase-preserving graph decomposition. Nat Biotechnol 35, 1167–1169 (2017). https://doi.org/10.1038/nbt.4020
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.4020
This article is cited by
-
Identifying novel mechanisms of biallelic TP53 loss refines poor outcome for patients with multiple myeloma
Blood Cancer Journal (2023)
-
Environmental gradients reveal stress hubs pre-dating plant terrestrialization
Nature Plants (2023)
-
Gapless genome assembly of Fusarium verticillioides, a filamentous fungus threatening plant and human health
Scientific Data (2023)
-
Genomic, functional and structural analyses elucidate evolutionary innovation within the sea anemone 8 toxin family
BMC Biology (2023)
-
A multi-omic Nicotiana benthamiana resource for fundamental research and biotechnology
Nature Plants (2023)