FML-seq protocol
Joseph W. Foley
12 Jan 2023
Fragmentation at methylated loci and sequencing (FML-seq) generates a sequencing library, from pre-isolated genomic DNA, in which every sequenceable molecule indicates two methylated cytosine positions in the genome. The FML-seq signal at any given site in the genome is proportional to the fraction of genome copies that were methylated at that site in the sample. Unlike other protocols for DNA methylation profiling based on cytosine deamination by bisulfite or APOBEC, FML-seq does not require harsh chemical conditions or numerous enzymatic reactions, nor does it require an initial DNA fragmentation by sonication. Instead there are only three hands-on steps: digestion of the gDNA by a methylation-specific restriction endonuclease, single-step library synthesis, and cleanup. The entire protocol from gDNA to a sequencing-ready library can be completed in two hours and the reagent costs are minimal.
FML-seq is intended for experiments with statistically powerful numbers of replicates per condition. Single-sample volumes in this protocol are impractical to pipet accurately and are intended to be prepared in a master mix with excess volume. In large experiments it may be useful to aliquot the master mix into a strip of tubes for multichannel pipetting, though this requires a greater excess volume. When using a multichannel pipet to dispense the reagents, consider randomizing or blocking the positions of samples in your plate to avoid confounding any experimental variable with channels of the pipet.
This method has been biologically validated with 6 ng to 60 ng of input gDNA (1,000 to 10,000 human cells). Lower amounts generate seemingly high-quality libraries but they do not contain enough distinct molecules for useful data. Higher amounts have not been tested. To prevent bias by the amount of input, dilute all gDNA samples to the lowest concentration in the experiment and use the same amount of gDNA for all. To normalize the amounts accurately, you must quantify the gDNA by fluorometry (Qubit, RiboGreen) instead of spectrophotometry (NanoDrop UV-Vis).
Compute the number of required PCR cycles based on the amount of genomic DNA input, with the formula
Cycles = 21 − log(ng genomic DNA) ÷ log(1.9)
Note: Especially when working with large batches in microplates, multichannel pipets reduce handling time, hand strain, and the risk of error. Electronic pipets are especially useful both for mixing liquids by pipetting and for quickly dispensing aliquots of master mix. Most of the protocol involves volumes of 10 µL or less except the final cleanup, so the Eppendorf Xplorer 0.5–10 µL and 5–100 µL models are sufficient, but a 200 or 300 µL multichannel pipet for the final ethanol wash can be manual rather than electronic.
Note: There are few choices available for low-retention PCR tubes and microplates; we have made successful libraries in lower-quality vessels, such as 8-tube strips. Many brands of pipet tips are claimed to be low-retention but in our experience some, e.g. Biotix, are still so retentive that they visibly fail to aspirate small volumes accurately. We use the "10 µL XL graduated" TipOne RPT filter tips with our 2.5, 10, and 20 µL Eppendorf pipets and "200 µL graduated" with 100 and 200 µL Eppendorf pipets, which take "universal"-shape tips; for Rainin-shape tips, we have heard but not validated a recommendation for Thomas Scientific SHARP tips. Almost all low-retention tips used in this protocol are 10 µL size except the 100/200 µL tips used briefly for mixing by pipetting in the cleanup step, when low retention may not be as crucial. The 200/300 tips used for the final ethanol wash and 1000 µL tips used for preparing premixes do not need be low-retention.
All chemicals must be molecular biology-grade (certified free of nucleic acids and nucleases) except where otherwise specified. Some products are available with better pricing on larger scales than the given catalog numbers.
Resuspend lyophilized oligonucleotides in DNA storage buffer at about 100 μM. Optionally verify by UV spectrophotometry (measure 1/10 dilutions for better accuracy) that the yields are within 10% of expectations.
To simplify the routine protocol, premix some of the reagents in large batches. Make sure to fully thaw, vortex, and (if possible) briefly centrifuge all reagents before mixing them, then vortex again to mix and centrifuge the final solution. Store the mixes at –20 °C.
Adjust the volumes of adapter stocks used here according to their measured concentrations (see Oligonucleotide dilutions).
PEG solution is viscous; slowly reverse-pipet it.
PCR primers: dilute each pair together to 3 µM each primer in TE+Tween.
Note: The final holds are safe stopping points and the reaction products are stable, so you can choose a final holding temperature to minimize the instrument's energy consumption and fan noise. For example, 14 °C is the closest to room temperature at which an Applied Biosystems Veriti disables its heated lid. If you are nearby when the program finishes, you can transfer the samples to a refrigerator in order to power off the thermal cycler until you are ready for the next step.
Reaction volume: 5 µL
Reaction volume: 10 µL
The number of PCR cycles is determined by the formula and may vary from one experiment to another.
Genomic DNA must be free of histones and other proteins, e.g. isolated by a protocol that includes proteinase K treatment. Preferably it should not be incubated above 70 °C, to keep it double-stranded and avoid GC-content bias. DNA is best stored in a slightly basic buffer and a surfactant improves pipetting accuracy (see "Reagents"). This protocol generates successful libraries from formalin-fixed, paraffin-embedded (FFPE) samples but has not been biologically validated for that application.
Note: The water volume may be reduced to use a greater volume of gDNA; change the aliquot volumes accordingly.
Safe stopping point: After digestion, the samples can stay in the thermal cycler or a refrigerator for a few days before you continue to library synthesis.
Safe stopping point: After digestion, the samples can stay in the thermal cycler or a refrigerator for a few days before you continue to cleanup.
Note: Certified molecular biology-grade water is not required for the 80% ethanol. Low-retention pipet tips are not required for the next steps, until resuspending the pellet.
Note: Perform the remaining steps quickly, until the pellet is resuspended, to prevent the beads from overdrying. If you have a large number of samples, perform the next steps on only half of them before coming back to this point for the other half.
The final yield should be 10 μL of amplified library between 50 and 200 nM, with most of the fragments between 150 and 400 bp. The size distribution can be verified by running the library undiluted with a TapeStation High Sensitivity D1000 kit or equivalent. The electropherogram may show evidence of overamplification: a secondary bump or especially wide smear of molecules that migrate more slowly than the rest, because they comprise complementary annealed adapters and noncomplementary, unannealable inserts. These libraries can still be sequenced, but this artifact makes molarity and size-distribution estimates inaccurate, and it is ideal to recalibrate the PCR cycles to the maximum number that does not produce this artifact.
To determine pooling volumes (if using cleanup for individual samples) and optimize loading concentrations, it is helpful to measure the concentration more precisely by qPCR, which measures sequenceable molecules rather than total DNA content. Roche's KAPA Library Quantification Kits are designed for this purpose.
A good minimum target for the human genome is at least 40 million read pairs per library, e.g. up to 96 libraries on an Illumina NovaSeq S2 flow cell. Long reads (> 50 nt) are not useful as this method counts fragments, not bases, but paired-end reads are useful to count both ends of each fragment (use 2x50 on NovaSeq and NextSeq 1000/200, 2x38 on NextSeq 500). Be sure to include the correct index read lengths for your indexing scheme (8 nt i7 + 8 nt i5 for combinatorial dual indexing, 10 nt i7 + no i5 for i7-only unique single indexing based on IDT/Illumina UDIs). No custom sequencing primers are required as the libraries have standard Nextera adapters. ΦX174 spike-in is not required, but a 1% spike-in is recommended in case the instrument fails and troubleshooting is required.
From the MspJI restriction motif, more than 50% of bases at position 17 in both reads will be G and more than 50% at position 14 will be C or T. This does not interfere with sequencing performance. The organism's methylation motif will also be reflected here, e.g. a human genome with CpG methylation will also be more than 50% C at position 16. GC content throughout the reads may also be higher than across the entire genome if DNA methylation is enriched in GC-rich regions, as in the human genome.
The sequence reads can be aligned to a reference genome with any standard aligner. The restriction digestion fragments DNA at a small number of specific positions, so duplicate fragments will frequently occur by chance; do not remove duplicates.
Suggested pipeline (implemented in FMLtools):
count_fml_hits.py
.
region_counts.py
.
Required one-time preparation:
get_sequence_positions.py
.
Oligonucleotide sequences © 2021 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited.
All sequences are in the order 5′→3′ and all nucleotides are DNA. All cytosines are unmethylated.
N
: equimolar mix of A, C, G, T
U
: deoxyuridine (DNA not RNA)
*
: phosphorothioate bond
[index]
: distinct index sequence for each version of the primer
Use HPLC purification. This reduces the complexity of the random bases, but that is less important than the purity of full-length molecules. These oligos form secondary structures in moderate salt conditions but that should not affect spectrophotometry in storage buffer. Although they are used for ligation, the adapters must not have a 5′ phosphate; synthesis without it is standard so simply do not request the optional 5′ phosphate.
NNNNCUGUCUCUUAUACACAUCUTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
NNNNCUGUCUCUUAUACACAUCUGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
Use IDT's Ultramer synthesis or HPLC purification. Choose any indexing scheme and order Nextera-compatible DNA primers using the provided designs, keeping most of the sequence constant and varying only the index. Combinatorial dual indexing is inexpensive but requires careful arrangement of the index pairs. Alternatively, use a P5 primer with no index and only the i7 indexes from the IDT/Illumina unique dual indexing scheme for Nextera, which is more expensive but less difficult to prepare and saves sequencing cycles that can be used to increase the read lengths for the insert instead. Unique dual indexing is not likely to be helpful for FML-seq. TruSeq primers are not compatible and index sequences validated with TruSeq adapters are not recommended for these Nextera adapters.
AATGATACGGCGACCACCGAGATCTACAC[index]TCGTCGGCAGCGT*C
CAAGCAGAAGACGGCATACGAGAT[index]GTCTCGTGGGCTCG*G