A real-time decoding sequencing based on dual mononucleotide addition for cyclic synthesis
Graphical abstract
Introduction
Primer-directed polymerase extension is able to incorporate thousands of base pairs, which indicates sequencing-by-synthesis (SBS) based technologies have great potential in read length. The existing SBS based high-throughput sequencing methodologies are classified as single nucleotide addition [2], [3], [4], [5], and four nucleotides addition [6], [7], [8], [9], [10]. The former is to add only one kind of nucleotide into a reaction to determine the number of incorporated nucleotides; the latter is to add special modified monomers to determine the type of incorporated nucleotides. In high-throughput DNA sequencing, the length of one read is an important indicator to measure the sequencing strategies. As for SBS, the read length is relevant to the number of reaction steps as well as the type of incorporated nucleotides. The large number of steps to read one base pair lends itself to significant inefficiencies [11]. In SBS with four nucleotides addition, a function of the read length and the cycle efficiency (Ceff) is as following: (Ceff)readlength = 0.5 [12]. When Ceff is 90%, the read length is 7 bp. However, when Ceff is more than 99%, the read length is greater than 100 bp. Therefore, an additional step (such as cleavage or incorporation of nucleotides) is likely to affect the Ceff, and eventually affect read length. In addition, the use of modified nucleotides gives rise to much shorter read lengths because of asynchrony [13]. That is probably why the read length from the existing commercialized sequencing instruments which use natural nucleotides as raw materials is longer than those using modified nucleotides.
Real-time sequencing methods have many advantages, such as maintaining the characteristics of natural nucleotides and eliminating the subsequent processing in the next sequencing cycles, but there is still one drawback that not every sequencing reaction gives an efficient message which may affect the Ceff, thus affecting the read length. There are strategies that may extend the read length by use of fewer than four labeled nucleotides. One of these strategies utilizes six different runs of different two-base pair combinations. The sequence is reconstructed from the order of each of the two bases [14]. Another strategy, called the ordered label strategy, relies on using all combinations of two labeled nucleotides to determine the order for each set, and then reconstructing the full sequence information [15]. Not long ago, some DNA sequencing technologies (ABI SOLiD sequencing) have been developed, which do not directly measure the base sequence, but measure DNA bases in pairs and in an encoded form, such as two-base encoding. In order to compare such data to a reference sequence, the encodings must be decoded into base sequence [16], [17]. This technology has great potential in error tolerance by differentiating biological variants from sequencing errors. However, read length is limited for fluorescently-labeled nucleotides and for large number of steps to read one base pair. In this report, we addressed a real-time decoding sequencing which contained the step of sequencing template sequence twice with dual mononucleotide addition, obtaining two sets of encodings, and finally reestablishing the base sequence with the two sets of encodings. This strategy relied on adding one of different two-base pair combinations, AG, CT, AC, GT, AT and CG, into the reaction each time. Here, AG, CT, AC, GT, AT and CG referred to reagent additions containing mixtures of those mononucleotide triphosphates. These different two-base pair combinations could form three kinds of dual mononucleotide addition, AG/CT, AC/GT, and AT/CG. This strategy interrogated the template sequence by any two combinations of AG/CT, AC/GT, and AT/CG. It is based on the principle that the signal intensities of released detection molecules are proportional to the number of incorporated nucleotides. When dual mononucleotide are added into the reaction each time, much stronger signal intensities are captured, making the detection of a trace of template possible. Moreover, the encodings obtained from a single sequencing run is also found to be useful for differentiation of nucleic acid sequence, since the signal intensity in each cycle and encoding size vary with the sequence composition and size. This strategy applies fewer cycles to obtain longer read length compared with traditional real-time sequencing strategy. It is compatible with most of the commercial sequencing instruments and is likely to be used as an alternative to the sequencing system. We hope it will provide the researchers with a new technology to analyze nucleic acid sequence.
Section snippets
Sequences and reagents
All the oligonucleotide sequences used are shown in Table 1. Synthetic sequences were purchased from Invitrogen (Shanghai, China). 5′ modifications were also performed by Invitrogen™ (Shanghai, China). The SQA PyroMark Gold Q96 Reagents (1 × 96), and solutions including annealing buffer (20 mM Tris–Acetat, 2 mM MgAc2, pH 7.6), denaturation buffer (0.2 M NaOH), wash buffer (10 mM Tris–Acetate, pH 7.6) and binding buffer (10 mM Tris–HCl, 2 M NaCl, 1 mM EDTA, 0.1% Tween 20, pH 7.6) were purchased from
Principle of the decoding sequencing strategy
Generally, when the first sequencing reaction is performed in the presence of two different nucleotides (named X and Y), an encoding XYn is generated, in which ‘X’ and ‘Y’ represent the type of two incorporated nucleotides and ‘n’ represents the number of incorporated nucleotides (in this study, a sequencing reaction is defined as a cycle). The obtained encoding XYn contains a total of 2n possible encodings (Table 2). Obviously, only one encoding XYn is not sufficient to determine the
Discussion
According to the previous description, this strategy enables to characterize natural nucleic acids, and to obtain longer read length with fewer cycles. It bases on the principle that the released molecules (here are pyrophosphates) are proportional to the number of incorporated nucleotides. In this strategy, the base sequence can be determined by two parallel sequencing runs. Interestingly, for appointed nucleic acid fragments (such as specified PCR product), the encoding information obtained
Conclusion
In summary, we have developed a novel real-time decoding sequencing strategy in this assay. By sequentially decoding two sets of encodings obtained from two parallel sequencing runs, the template sequences have been reconstructed successfully. We have applied this decoding strategy to differentiate nucleic acid sequences, and four species with few differences in the P3 regions were successfully differentiated by only a single sequencing run. This strategy applies fewer steps to obtain longer
Acknowledgements
This work was supported by the Major State Basic Research Development Program of China (2012CB517706); the National Natural Science Foundation of China (60971018, 61227803); and the Fundamental Research Funds for the Central Universities (CXLX13_112).
References (26)
Advances in sequencing technology
Mutat. Res.
(2005)- et al.
A sequencing method based on real-time pyrophosphate
Science
(1998) - et al.
Fluorogenic DNA sequencing in PDMS microreactors
Nat. Methods
(2011) - et al.
Genome sequencing in microfabricated high-density picolitre reactors
Nature
(2005) - et al.
An integrated semiconductor device enabling non-optical genome sequencing
Nature
(2011) - et al.
Real-time DNA sequencing from single polymerase molecules
Science
(2009) - et al.
Fluoride-cleavable, fluorescently labelled reversible terminators: synthesis and use in primer extension
Chem.-Eur. J.
(2011) - et al.
Virtual terminator nucleotides for next-generation DNA sequencing
Nat. Methods
(2009) - et al.
Accurate whole human genome sequencing using reversible terminator chemistry
Nature
(2008) - et al.
Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection
Proc. Natl. Acad. Sci. U. S. A.
(2010)
Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators
Proc. Natl. Acad. Sci. U. S. A.
Applications of next-generation sequencing sequencing technologies – the next generation
Nat. Rev. Genet.
Termination of DNA synthesis by novel 3′-modified-deoxyribonucleoside 5′-triphosphates
Nucleic Acids Res.
Cited by (12)
Methods to improve the accuracy of next-generation sequencing
2023, Frontiers in Bioengineering and BiotechnologyCombining E-ice-COLD-PCR and Pyrosequencing with Di-Base Addition (PDBA) Enables Sensitive Detection of Low-Abundance Mutations
2023, Applied Biochemistry and BiotechnologyEvaluation of the correctable decoding sequencing as a new powerful strategy for DNA sequencing
2022, Life Science Alliance
- 1
The author contributed equally to this work.