Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis
ReviewAdvances in sequencing technology
Introduction
Rapid genome sequencing is one of the grand challenges of genome science today. With the completion of the human genome sequence [2], [3], we ask “What next?” The answer is that we need the genome sequence for all individuals to fully understand genome variation, genetic susceptibility to disease, and pharmacogenomics of drug response. The leading genome centers and scientists have publicly recognized this as one of the core enabling goals for the next 5–10 years. The National Human Genome Research Institute (NHGRI) has echoed this need through its recently announced vision for genomics research [4], and The J. Craig Venter Science Foundation has recently announced a US$ 500,000 prize to the group or individual who “significantly advances automated DNA sequencing …”. The NHGRI has categorized new sequencing approaches into those that offer near-term and revolutionary benefits. Those that are near-term should advance the field with a 100-fold cost reduction per base pair (bp), within the next 5 years. Those that are revolutionary should advance the field with a 10,000-fold cost reduction per base pair, within the next 5–10 years; these approaches should be able to attain the “US$ 1000 genome”, or sequencing a human genome for US$ 1000. While the major emphasis is on cost, both read length and throughput dictate the practicality of each method and the applicability of each to de novo sequencing and/or resequencing.
The need for broad novel DNA sequencing technologies is substantiated. While enormous progress has been made over the course of the past 15 years in depositing an exponential amount of information into GenBank (Fig. 1), we are in need of individual sequence information. The completed human genome represents 0.000000001% of all human DNA that has been sequenced. The world's 6.39 billion individuals need to be fully considered. In addition to humans, the NHGRI has a significant movement to sequence other organisms including those of the armadillo, bat, chimpanzee, deer, and echidna. Similarly, in post-human genome sequencing efforts, Venter et al. have pursued the understanding of additional biodiversity through the Sargasso Sea, discovering 148 previously unknown bacterial phylotypes in an environmental shotgun sequencing project [5]. With a total of 10–100 million total species on Earth, the amount of genetic diversity is staggering. For dominant mammals alone, there are approximately 4600–4800 species; and for ants alone, there are approximately 11,000 species. The ability to increase the amount of characterized species through faster sequencing will enable more powerful phylogenetic studies to be performed, including those critical for human disease gene identification and functional analysis.
The inability to study large numbers of individuals has limited the current estimates of human genetic diversity. Studies from The SNP Consortium were performed on only 10 individuals [6], and those from Perlegen (Mountain View, CA) only on 20 individuals [7]. With limited data sets, these studies have garnered insight into Homo sapiens genetic diversity in the form of haplotype studies and SNP maps, showing that SNPs in linkage disequilibrium exist as haplotype blocks and that the average spacing between SNPs is 1000 base pairs [6], [7], [8]. More ambitious efforts are currently underway in the form of the International Haplotype Map (HapMap) project [9], [10], [11], [12], [13]. Though larger in scale than most other studies at US$ 100 million, this effort plans to analyze only 270 individuals for haplotype structure and diversity. Mainly, the HapMap project exists to simplify genetic association studies, potentially reducing the number of SNPs per analyzed individual from 10 million to roughly 500,000, with this lower figure potentially an underestimate [8]. The current studies and methods highlight the challenges faced by genetics researchers—only small representative populations can be analyzed, at slow rates, and with limited technology. A greater sample size needs to be fully considered when it relates to efforts to understand human diseases, inherited traits, and evolution. Faster sequencing, particularly revolutionary sequencing methods, will facilitate and obviate some of the current strategies. This includes the ability to create a more comprehensive SNP catalogue and the possibility of using sequencing as a universal tool to score variations between individuals. Furthermore, with the development of sequencing methods that have read lengths comparable in length to haplotype blocks, haplotype maps would no longer be required to perform statistically significant genetic studies.
With “personal genomics,” a term that describes individual access to their own genome sequence information, on the horizon, it is apt to summarize the efforts of the groups who are going to make this become a reality. This paper is intended to review the state-of-the-art for Sanger-based methods and also progress for new methods, those that do not rely on electrophoretic separation of Sanger dideoxy reaction products. This is neither intended to summarize new developments related to the Sanger method, describe the ethical, legal, and societal implications (ELSI) of personal genomics, nor expound on the potential worldly and other worldly impact of such a technology paradigm shift; instead, this review will present a thorough and detailed analysis of each method's technical status, its strengths and weaknesses, and its remaining challenges.
Section snippets
Sequencing technology overview
Sequencing technology refers to the suite of instruments, disposables, protocols, and methods that are involved from sample collection, to sample isolation, to sample preparation, to sequencing, to data assembly, and through sequence finishing (Fig. 2). Very often instrument-to-instrument comparisons are made; this narrow interpretation is erroneous since a majority of the time and cost of genome-scale sequencing is performed during the sample preparation stages. Conceptually, the ideal
Current state-of-the-art production genome sequencing
Compared with the near-term and revolutionary sequencing goals, current production sequencing is cumbersome. Production-scale genome sequencing is only possible at genome centers where there are significant requirements for space, personnel, and equipment. Today, these sequencing centers represent the state-of-the-art in genome sequencing, having optimized high-throughput sample preparation, sequence production, and data analysis. Furthermore, overall sequencing strategies have been tested,
Near-term sequencing approaches
Technologies that will lead us to 100× cost improvements over the next 5 years are classified as near-term sequencing approaches. Near-term sequencing approaches share several common elements. Each approach has some or all of the following elements: highly parallel readout, cycle extension methodology, single molecule polymerase reading methodology, or exonuclease cleavage methodology. While it is challenging to predict the time frame of research, the approaches listed in this category have
Revolutionary sequencing approaches
Technologies that can lead us to the US$ 1000 genome within 5–10 years are classified as revolutionary sequencing approaches. Revolutionary sequencing approaches bypass the need for complex sample preparation, offer the potential for long read lengths, and read DNA in real-time. The two methods, linear DNA scanning and nanopore sequencing, share the common feature of scanning DNA in a linear manner. These are listed in Table 3 along with some of their features. Their proposed sequencing
Direct linear analysis
DLA enables the real-time single molecule scanning of DNA molecules using a nanofluidic device (Fig. 6). The approach relies on the use of photolithographically-defined channels that precisely control DNA conformation. A dilute solution of single DNA molecules is pipetted into the entrance port of the nanodevice. The nanodevice fills by capillary action and hydrostatic pressure is used to drive the DNA molecules down the channels, first through a wide 50–100 μm region, then through a defined
Conclusion
New technologies for DNA sequencing abound. There is no shortage of proposals to replace the current Sanger method. The current state-of-the-art production sequencing occurs only in well-funded genome centers. The estimated cost for a draft human-sized genome sequence is US$ 24 million, an impractically expensive figure in the context of population studies. With the continued interest in genomics, in understanding human diseases, and in human genetic diversity, a push towards the US$ 1000
References (142)
- et al.
Nucleotide sequence analysis of DNA II. Complete nucleotide sequence of the cohesive ends of bacteriophage lambda DNA
J. Mol. Biol.
(1971) - et al.
Nucleotide sequence of bacteriophage lambda DNA
J. Mol. Biol.
(1982) - et al.
Automated DNA sequencing and analysis of the human genome
Genomics
(1987) - et al.
Capillary gel electrophoresis for DNA sequencing laser-induced fluorescence detection with the sheath flow cuvette
J. Chromatogr.
(1990) - et al.
Separation and analysis of DNA sequence reaction products by capillary gel electrophoresis
J. Chromatogr.
(1990) - et al.
Separation of DNA restriction fragments by high performance capillary electrophoresis with low and zero crosslinked polyacrylamide using continuous and pulsed electric fields
J. Chromatogr.
(1990) Separation of DNA fragments by capillary electrophoresis using replaceable linear polyacrylamide matrices
J. Chromatogr. A
(1993)- et al.
Microfluidic devices for DNA sequencing: sample preparation and electrophoretic analysis
Curr. Opin. Biotechnol.
(2003) - et al.
Fluorescent in situ sequencing on polymerase colonies
Anal. Biochem.
(2003) - et al.
Fluorescent high-density labeling of DNA: error-free substitution for a normal nucleotide
J. Biotechnol.
(2001)
Towards a general procedure for sequencing single DNA molecules
J. Biotechnol.
Fluorescently labeled model DNA sequences for exonucleolytic sequencing
J. Biotechnol.
Polymer support for exonucleolytic sequencing
J. Biotechnol.
Single molecule DNA sequencing in submicrometer channels: state of the art and future prospects
J. Biotechnol.
Heavy atoms in model compounds and nucleic acid imaged by dark field transmission electron microscopy
J. Mol. Biol.
DNA polymerase fluorescent substrates with reversible 3′-tags
Gene
Nearest neighbor influences on DNA polymerase insertion fidelity
J. Biol. Chem.
Rapid DNA sequencing based upon single molecule detection
Genet. Anal. Tech. Appl.
Progress towards single-molecule DNA sequencing: a one-color demonstration
J. Biotechnol.
Microsecond time-scale discrimination among polycytidylic acid, polyadenylic acid, and polyuridylic acid as homopolymers or as segments within single RNA molecules
Biophys. J.
Single-molecule studies of DNA mechanics
Curr. Opin. Struct. Biol.
DNA sequencing with chain-terminating inhibitors
Proc. Natl. Acad. Sci. U.S.A.
The sequence of the human genome
Science
Initial sequencing and analysis of the human genome
Nature
A vision for the future of genomics research
Nature
Environmental genome shotgun sequencing of the Sargasso Sea
Science
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms
Nature
Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21
Science
The structure of haplotype blocks in the human genome
Science
Genomics, consensus emerges on HapMap strategy
Science
Haplotype diversity and SNP frequency dependence in the description of genetic variation
Eur. J. Hum. Genet.
Human genome HapMap launched with pledges of US$ 100 million
Science
Initial sequencing and comparative analysis of the mouse genome
Nature
Nucliotide sequence of bacteriophage phi X174 DNA
Nature
Fluorescence detection in automated DNA sequence analysis
Nature
Large-scale and automated DNA sequence determination
Science
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd
Science
Genome sequence of the Brown Norway rat yields insights into mammalian evolution
Nature
Unified description of electrophoresis and diffusion for DNA and other polyions
Biochemistry
DNA sequencing up to 1300 bases in 2 h by capillary electrophoresis with mixed replaceable linear polyacrylamide solutions
Anal. Chem.
DNA sequencing by capillary electrophoresis
Electrophoresis
DNA sequencing by capillary electrophoresis with replaceable linear polyacrylamide and laser-induced fluorescence detection
Anal. Chem.
Base-calling of automated sequencer traces using phred II. Error probabilities
Genome Res.
Base-calling of automated sequencer traces using phred I. Accuracy assessment
Genome Res.
On the sequencing of the human genome
Proc. Natl. Acad. Sci. U.S.A.
The Human Genome Project: reaching the finish line
Science
A new strategy for genome sequencing
Nature
A physical map of the human genome
Nature
Cited by (128)
Recent advances in lung cancer genomics: Application in targeted therapy
2021, Advances in GeneticsInformation Technology
2017, Clinical and Translational Science: Principles of Human Research: Second EditionAims and methods of biosteganography
2016, Journal of BiotechnologyCitation Excerpt :Section 5 is an assessment of the feasibility of biosteganography, as well as its technological (Section 5.1) and political (Section 5.2) implications. Advancements in sequencing technology have occurred rapidly and with profound implications, not the least of which is the quantity and rate of sequencing that can now be done (Chan, 2005; Carlson, 2009; Jain et al., 2015; Loman and Watson, 2015). Complimenting this progress in sequencing—biological information extraction—there has been a parallel leap in the ability to engineer nucleic acids.
A real-time decoding sequencing based on dual mononucleotide addition for cyclic synthesis
2014, Analytica Chimica ActaMolecular Taxonomy
2014, Molecular Medical MicrobiologyEffects of In Utero EtOH Exposure on 18S Ribosomal RNA Processing: Contribution to Fetal Alcohol Spectrum Disorder
2023, International Journal of Molecular Sciences