Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Accurate assembly of transcripts through phase-preserving graph decomposition

Abstract

We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves long-range phasing paths extracted from reads, while producing a parsimonious set of transcripts and minimizing coverage deviation. On 10 human RNA-seq samples, Scallop produces 34.5% and 36.3% more correct multi-exon transcripts than StringTie and TransComb, and respectively identifies 67.5% and 52.3% more lowly expressed transcripts. Scallop achieves higher sensitivity and precision than previous approaches over a wide range of coverage thresholds.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of the three methods (StringTie, TransComb, and Scallop) over the five testing samples.
Figure 2: Overview of Scallop.

Similar content being viewed by others

References

  1. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Meth. 5, 621–628 (2008).

    Article  CAS  Google Scholar 

  2. Lister, R. et al. Cell 133, 523–536 (2008).

    Article  CAS  Google Scholar 

  3. Wang, Z., Gerstein, M. & Snyder, M. Nat. Rev. Genet. 10, 57–63 (2009).

    Article  CAS  Google Scholar 

  4. Pickrell, J.K. et al. Nature 464, 768–772 (2010).

    Article  CAS  Google Scholar 

  5. Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).

    Article  CAS  Google Scholar 

  6. Guttman, M. et al. Nat. Biotechnol. 28, 503–510 (2010).

    Article  CAS  Google Scholar 

  7. Li, W., Feng, J. & Jiang, T. J. Comput. Biol. 18, 1693–1707 (2011).

    Article  Google Scholar 

  8. Lin, Y.-Y. et al. in Proc. 12th Workshop Algs. in Bioinf. (WABI'12), vol. 7534 of Lecture Notes in Comp. Sci. 178–189 (2012).

  9. Song, L. & Florea, L. BMC Bioinformatics 14, S14 (2013).

    Article  Google Scholar 

  10. Neff, K.L. et al. BMC Bioinformatics 14, 1 (2013).

    Article  CAS  Google Scholar 

  11. Maretty, L., Sibbesen, J.A. & Krogh, A. Genome Biol. 15, 1 (2014).

    Article  Google Scholar 

  12. Canzar, S., Andreotti, S., Weese, D., Reinert, K. & Klau, G.W. Genome Biol. 17, 16 (2016).

    Article  Google Scholar 

  13. Pertea, M. et al. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  Google Scholar 

  14. Liu, J., Yu, T., Jiang, T. & Li, G. Genome Biol. 17, 213 (2016).

    Article  Google Scholar 

  15. Hayer, K.E., Pizarro, A., Lahens, N.F., Hogenesch, J.B. & Grant, G.R. Bioinformatics 31, 3938–3945 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Kim, D. et al. Genome Biol. 14, R36 (2013).

    Article  Google Scholar 

  17. Dobin, A. et al. Bioinformatics 29, 15–21 (2013).

    Article  CAS  Google Scholar 

  18. Kim, D., Langmead, B. & Salzberg, S.L. Nat. Methods 12, 357–360 (2015).

    Article  CAS  Google Scholar 

  19. Patro, R., Duggal, G., Love, M.I., Irizarry, R.A. & Kingsford, C. Nat. Methods 14, 417–419 (2017).

    Article  CAS  Google Scholar 

  20. Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  Google Scholar 

  21. Vatinlen, B., Chauvet, F., Chrétienne, P. & Mahey, P. Eur. J. Oper. Res. 185, 1390–1401 (2008).

    Article  Google Scholar 

  22. Shao, M. & Kingsford, C. Preprint at bioRxiv https://www.biorxiv.org/content/early/2016/11/16/087759.

Download references

Acknowledgements

We thank Cong Ma and Juntao Liu for helpful suggestions and discussions. This research is funded in part by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4554 to C.K., by The Shurl and Kay Curci Foundation, by the US National Science Foundation (CCF-1256087, CCF-1319998), and by the US National Institutes of Health (R01HG007104 and R01GM122935).

Author information

Authors and Affiliations

Authors

Contributions

M.S. and C.K. designed the method, and M.S. implemented it. M.S. and C.K. designed the experiments, and M.S. conducted them. M.S. and C.K. wrote the manuscript.

Corresponding author

Correspondence to Carl Kingsford.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–21, Supplementary Tables 1–3, Supplementary Notes 1–7 (PDF 2473 kb)

Life Sciences Reporting Summary (PDF 176 kb)

Supplementary Code

Source Code of Scallop (ZIP 178 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shao, M., Kingsford, C. Accurate assembly of transcripts through phase-preserving graph decomposition. Nat Biotechnol 35, 1167–1169 (2017). https://doi.org/10.1038/nbt.4020

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.4020

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing