Review
Advances in sequencing technology

https://doi.org/10.1016/j.mrfmmm.2005.01.004Get rights and content

Abstract

Faster sequencing methods will undoubtedly lead to faster single nucleotide polymorphism (SNP) discovery. The Sanger method has served as the cornerstone for genome sequence production since 1977, close to almost 30 years of tremendous utility [Sanger, F., Nicklen, S., Coulson, A.R, DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. U.S.A. 74 (1977) 5463–5467]. With the completion of the human genome sequence [Venter, J.C. et al., The sequence of the human genome, Science 291 (2001) 1304–1351; Lander, E.S. et al., Initial sequencing and analysis of the human genome, Nature 409 (2001) 860–921], there is now a focus on developing new sequencing methodologies that will enable “personal genomics”, or the routine study of our individual genomes. Technologies that will lead us to this lofty goal are those that can provide improvements in three areas: read length, throughput, and cost. As progress is made in this field, large sections of genomes and then whole genomes of individuals will become increasingly more facile to sequence. SNP discovery efforts will be enhanced lock-step with these improvements. Here, the breadth of new sequencing approaches will be summarized including their status and prospects for enabling personal genomics.

Introduction

Rapid genome sequencing is one of the grand challenges of genome science today. With the completion of the human genome sequence [2], [3], we ask “What next?” The answer is that we need the genome sequence for all individuals to fully understand genome variation, genetic susceptibility to disease, and pharmacogenomics of drug response. The leading genome centers and scientists have publicly recognized this as one of the core enabling goals for the next 5–10 years. The National Human Genome Research Institute (NHGRI) has echoed this need through its recently announced vision for genomics research [4], and The J. Craig Venter Science Foundation has recently announced a US$ 500,000 prize to the group or individual who “significantly advances automated DNA sequencing …”. The NHGRI has categorized new sequencing approaches into those that offer near-term and revolutionary benefits. Those that are near-term should advance the field with a 100-fold cost reduction per base pair (bp), within the next 5 years. Those that are revolutionary should advance the field with a 10,000-fold cost reduction per base pair, within the next 5–10 years; these approaches should be able to attain the “US$ 1000 genome”, or sequencing a human genome for US$ 1000. While the major emphasis is on cost, both read length and throughput dictate the practicality of each method and the applicability of each to de novo sequencing and/or resequencing.

The need for broad novel DNA sequencing technologies is substantiated. While enormous progress has been made over the course of the past 15 years in depositing an exponential amount of information into GenBank (Fig. 1), we are in need of individual sequence information. The completed human genome represents 0.000000001% of all human DNA that has been sequenced. The world's 6.39 billion individuals need to be fully considered. In addition to humans, the NHGRI has a significant movement to sequence other organisms including those of the armadillo, bat, chimpanzee, deer, and echidna. Similarly, in post-human genome sequencing efforts, Venter et al. have pursued the understanding of additional biodiversity through the Sargasso Sea, discovering 148 previously unknown bacterial phylotypes in an environmental shotgun sequencing project [5]. With a total of 10–100 million total species on Earth, the amount of genetic diversity is staggering. For dominant mammals alone, there are approximately 4600–4800 species; and for ants alone, there are approximately 11,000 species. The ability to increase the amount of characterized species through faster sequencing will enable more powerful phylogenetic studies to be performed, including those critical for human disease gene identification and functional analysis.

The inability to study large numbers of individuals has limited the current estimates of human genetic diversity. Studies from The SNP Consortium were performed on only 10 individuals [6], and those from Perlegen (Mountain View, CA) only on 20 individuals [7]. With limited data sets, these studies have garnered insight into Homo sapiens genetic diversity in the form of haplotype studies and SNP maps, showing that SNPs in linkage disequilibrium exist as haplotype blocks and that the average spacing between SNPs is 1000 base pairs [6], [7], [8]. More ambitious efforts are currently underway in the form of the International Haplotype Map (HapMap) project [9], [10], [11], [12], [13]. Though larger in scale than most other studies at US$ 100 million, this effort plans to analyze only 270 individuals for haplotype structure and diversity. Mainly, the HapMap project exists to simplify genetic association studies, potentially reducing the number of SNPs per analyzed individual from 10 million to roughly 500,000, with this lower figure potentially an underestimate [8]. The current studies and methods highlight the challenges faced by genetics researchers—only small representative populations can be analyzed, at slow rates, and with limited technology. A greater sample size needs to be fully considered when it relates to efforts to understand human diseases, inherited traits, and evolution. Faster sequencing, particularly revolutionary sequencing methods, will facilitate and obviate some of the current strategies. This includes the ability to create a more comprehensive SNP catalogue and the possibility of using sequencing as a universal tool to score variations between individuals. Furthermore, with the development of sequencing methods that have read lengths comparable in length to haplotype blocks, haplotype maps would no longer be required to perform statistically significant genetic studies.

With “personal genomics,” a term that describes individual access to their own genome sequence information, on the horizon, it is apt to summarize the efforts of the groups who are going to make this become a reality. This paper is intended to review the state-of-the-art for Sanger-based methods and also progress for new methods, those that do not rely on electrophoretic separation of Sanger dideoxy reaction products. This is neither intended to summarize new developments related to the Sanger method, describe the ethical, legal, and societal implications (ELSI) of personal genomics, nor expound on the potential worldly and other worldly impact of such a technology paradigm shift; instead, this review will present a thorough and detailed analysis of each method's technical status, its strengths and weaknesses, and its remaining challenges.

Section snippets

Sequencing technology overview

Sequencing technology refers to the suite of instruments, disposables, protocols, and methods that are involved from sample collection, to sample isolation, to sample preparation, to sequencing, to data assembly, and through sequence finishing (Fig. 2). Very often instrument-to-instrument comparisons are made; this narrow interpretation is erroneous since a majority of the time and cost of genome-scale sequencing is performed during the sample preparation stages. Conceptually, the ideal

Current state-of-the-art production genome sequencing

Compared with the near-term and revolutionary sequencing goals, current production sequencing is cumbersome. Production-scale genome sequencing is only possible at genome centers where there are significant requirements for space, personnel, and equipment. Today, these sequencing centers represent the state-of-the-art in genome sequencing, having optimized high-throughput sample preparation, sequence production, and data analysis. Furthermore, overall sequencing strategies have been tested,

Near-term sequencing approaches

Technologies that will lead us to 100× cost improvements over the next 5 years are classified as near-term sequencing approaches. Near-term sequencing approaches share several common elements. Each approach has some or all of the following elements: highly parallel readout, cycle extension methodology, single molecule polymerase reading methodology, or exonuclease cleavage methodology. While it is challenging to predict the time frame of research, the approaches listed in this category have

Revolutionary sequencing approaches

Technologies that can lead us to the US$ 1000 genome within 5–10 years are classified as revolutionary sequencing approaches. Revolutionary sequencing approaches bypass the need for complex sample preparation, offer the potential for long read lengths, and read DNA in real-time. The two methods, linear DNA scanning and nanopore sequencing, share the common feature of scanning DNA in a linear manner. These are listed in Table 3 along with some of their features. Their proposed sequencing

Direct linear analysis

DLA enables the real-time single molecule scanning of DNA molecules using a nanofluidic device (Fig. 6). The approach relies on the use of photolithographically-defined channels that precisely control DNA conformation. A dilute solution of single DNA molecules is pipetted into the entrance port of the nanodevice. The nanodevice fills by capillary action and hydrostatic pressure is used to drive the DNA molecules down the channels, first through a wide 50–100 μm region, then through a defined

Conclusion

New technologies for DNA sequencing abound. There is no shortage of proposals to replace the current Sanger method. The current state-of-the-art production sequencing occurs only in well-funded genome centers. The estimated cost for a draft human-sized genome sequence is US$ 24 million, an impractically expensive figure in the context of population studies. With the continued interest in genomics, in understanding human diseases, and in human genetic diversity, a push towards the US$ 1000

References (142)

  • J. Stephan

    Towards a general procedure for sequencing single DNA molecules

    J. Biotechnol.

    (2001)
  • Z. Foldes-Papp

    Fluorescently labeled model DNA sequences for exonucleolytic sequencing

    J. Biotechnol.

    (2001)
  • M. Hinz et al.

    Polymer support for exonucleolytic sequencing

    J. Biotechnol.

    (2001)
  • M. Sauer

    Single molecule DNA sequencing in submicrometer channels: state of the art and future prospects

    J. Biotechnol.

    (2001)
  • R.F. Whiting et al.

    Heavy atoms in model compounds and nucleic acid imaged by dark field transmission electron microscopy

    J. Mol. Biol.

    (1972)
  • B. Canard et al.

    DNA polymerase fluorescent substrates with reversible 3′-tags

    Gene

    (1994)
  • L.V. Mendelman et al.

    Nearest neighbor influences on DNA polymerase insertion fidelity

    J. Biol. Chem.

    (1989)
  • L.M. Davis

    Rapid DNA sequencing based upon single molecule detection

    Genet. Anal. Tech. Appl.

    (1991)
  • J.H. Werner

    Progress towards single-molecule DNA sequencing: a one-color demonstration

    J. Biotechnol.

    (2003)
  • M. Akeson et al.

    Microsecond time-scale discrimination among polycytidylic acid, polyadenylic acid, and polyuridylic acid as homopolymers or as segments within single RNA molecules

    Biophys. J.

    (1999)
  • C. Bustamante et al.

    Single-molecule studies of DNA mechanics

    Curr. Opin. Struct. Biol.

    (2000)
  • F. Sanger et al.

    DNA sequencing with chain-terminating inhibitors

    Proc. Natl. Acad. Sci. U.S.A.

    (1977)
  • J.C. Venter

    The sequence of the human genome

    Science

    (2001)
  • E.S. Lander

    Initial sequencing and analysis of the human genome

    Nature

    (2001)
  • F.S. Collins et al.

    A vision for the future of genomics research

    Nature

    (2003)
  • J.C. Venter

    Environmental genome shotgun sequencing of the Sargasso Sea

    Science

    (2004)
  • R. Sachidanandam

    A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

    Nature

    (2001)
  • N. Patil

    Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21

    Science

    (2001)
  • S.B. Gabriel

    The structure of haplotype blocks in the human genome

    Science

    (2002)
  • Integrating ethics and science in the International HapMap Project, Nat. Rev. Genet. 5 (2004)...
  • J. Couzin

    Genomics, consensus emerges on HapMap strategy

    Science

    (2004)
  • M.P. Stumpf

    Haplotype diversity and SNP frequency dependence in the description of genetic variation

    Eur. J. Hum. Genet.

    (2004)
  • The International HapMap Project, Nature 426 (2003)...
  • J. Couzin

    Human genome HapMap launched with pledges of US$ 100 million

    Science

    (2002)
  • R.H. Waterston

    Initial sequencing and comparative analysis of the mouse genome

    Nature

    (2002)
  • F. Sanger

    Nucliotide sequence of bacteriophage phi X174 DNA

    Nature

    (1977)
  • L.M. Smith

    Fluorescence detection in automated DNA sequence analysis

    Nature

    (1986)
  • T. Hunkapiller et al.

    Large-scale and automated DNA sequence determination

    Science

    (1991)
  • R.D. Fleischmann

    Whole-genome random sequencing and assembly of Haemophilus influenzae Rd

    Science

    (1995)
  • R.A. Gibbs

    Genome sequence of the Brown Norway rat yields insights into mammalian evolution

    Nature

    (2005)
  • E. Stellwagen et al.

    Unified description of electrophoresis and diffusion for DNA and other polyions

    Biochemistry

    (2003)
  • H. Zhou

    DNA sequencing up to 1300 bases in 2 h by capillary electrophoresis with mixed replaceable linear polyacrylamide solutions

    Anal. Chem.

    (2000)
  • N.J. Dovichi

    DNA sequencing by capillary electrophoresis

    Electrophoresis

    (1997)
  • M.C. Ruiz-Martinez

    DNA sequencing by capillary electrophoresis with replaceable linear polyacrylamide and laser-induced fluorescence detection

    Anal. Chem.

    (1993)
  • B. Ewing et al.

    Base-calling of automated sequencer traces using phred II. Error probabilities

    Genome Res.

    (1998)
  • B. Ewing et al.

    Base-calling of automated sequencer traces using phred I. Accuracy assessment

    Genome Res.

    (1998)
  • R.H. Waterston et al.

    On the sequencing of the human genome

    Proc. Natl. Acad. Sci. U.S.A.

    (2002)
  • R. Waterston et al.

    The Human Genome Project: reaching the finish line

    Science

    (1998)
  • J.C. Venter et al.

    A new strategy for genome sequencing

    Nature

    (1996)
  • J.D. McPherson

    A physical map of the human genome

    Nature

    (2001)
  • Cited by (128)

    • Information Technology

      2017, Clinical and Translational Science: Principles of Human Research: Second Edition
    • Aims and methods of biosteganography

      2016, Journal of Biotechnology
      Citation Excerpt :

      Section 5 is an assessment of the feasibility of biosteganography, as well as its technological (Section 5.1) and political (Section 5.2) implications. Advancements in sequencing technology have occurred rapidly and with profound implications, not the least of which is the quantity and rate of sequencing that can now be done (Chan, 2005; Carlson, 2009; Jain et al., 2015; Loman and Watson, 2015). Complimenting this progress in sequencing—biological information extraction—there has been a parallel leap in the ability to engineer nucleic acids.

    • Molecular Taxonomy

      2014, Molecular Medical Microbiology
    View all citing articles on Scopus
    View full text