Resolving the full spectrum of human genome variation using Linked-Reads
- Patrick Marks1,
- Sarah Garcia1,
- Alvaro Martinez Barrio1,
- Kamila Belhocine1,
- Jorge Bernate1,
- Rajiv Bharadwaj1,
- Keith Bjornson1,
- Claudia Catalanotti1,
- Josh Delaney1,
- Adrian Fehr1,
- Ian T. Fiddes1,
- Brendan Galvin1,
- Haynes Heaton1,5,
- Jill Herschleb1,
- Christopher Hindson1,
- Esty Holt2,
- Cassandra B. Jabara1,6,
- Susanna Jett1,7,
- Nikka Keivanfar1,
- Sofia Kyriazopoulou-Panagiotopoulou1,8,
- Monkol Lek3,4,
- Bill Lin1,
- Adam Lowe1,
- Shazia Mahamdallie2,
- Shamoni Maheshwari1,
- Tony Makarewicz1,
- Jamie Marshall4,
- Francesca Meschi1,
- Christopher J. O'Keefe1,
- Heather Ordonez1,
- Pranav Patel1,
- Andrew Price1,
- Ariel Royall1,
- Elise Ruark2,
- Sheila Seal2,
- Michael Schnall-Levin1,
- Preyas Shah1,
- David Stafford1,
- Stephen Williams1,
- Indira Wu1,
- Andrew Wei Xu1,
- Nazneen Rahman2,
- Daniel MacArthur3,4 and
- Deanna M. Church1,9
- 110x Genomics, Pleasanton, California 94566, USA;
- 2The Institute of Cancer Research, Division of Genetics and Epidemiology, London SM2 5NG, United Kingdom;
- 3Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
- 4Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Abstract
Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from ∼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as “Linked-Reads”. This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2. Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.234443.118.
-
Freely available online through the Genome Research Open Access option.
- Received January 9, 2018.
- Accepted February 21, 2019.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.