FML-seq protocol
Joseph W. Foley
12 Jan 2023


Introduction

Abstract

Fragmentation at methylated loci and sequencing (FML-seq) generates a sequencing library, from pre-isolated genomic DNA, in which every sequenceable molecule indicates two methylated cytosine positions in the genome. The FML-seq signal at any given site in the genome is proportional to the fraction of genome copies that were methylated at that site in the sample. Unlike other protocols for DNA methylation profiling based on cytosine deamination by bisulfite or APOBEC, FML-seq does not require harsh chemical conditions or numerous enzymatic reactions, nor does it require an initial DNA fragmentation by sonication. Instead there are only three hands-on steps: digestion of the gDNA by a methylation-specific restriction endonuclease, single-step library synthesis, and cleanup. The entire protocol from gDNA to a sequencing-ready library can be completed in two hours and the reagent costs are minimal.

Experimental design considerations

FML-seq is intended for experiments with statistically powerful numbers of replicates per condition. Single-sample volumes in this protocol are impractical to pipet accurately and are intended to be prepared in a master mix with excess volume. In large experiments it may be useful to aliquot the master mix into a strip of tubes for multichannel pipetting, though this requires a greater excess volume. When using a multichannel pipet to dispense the reagents, consider randomizing or blocking the positions of samples in your plate to avoid confounding any experimental variable with channels of the pipet.

This method has been biologically validated with 6 ng to 60 ng of input gDNA (1,000 to 10,000 human cells). Lower amounts generate seemingly high-quality libraries but they do not contain enough distinct molecules for useful data. Higher amounts have not been tested. To prevent bias by the amount of input, dilute all gDNA samples to the lowest concentration in the experiment and use the same amount of gDNA for all. To normalize the amounts accurately, you must quantify the gDNA by fluorometry (Qubit, RiboGreen) instead of spectrophotometry (NanoDrop UV-Vis).

Choosing the number of PCR cycles

Compute the number of required PCR cycles based on the amount of genomic DNA input, with the formula

Cycles = 21 − log(ng genomic DNA) ÷ log(1.9)

PCR cycle calculator

ng genomic DNA per sample
Recommended: cycles


Advance preparation

Required equipment

Note: Especially when working with large batches in microplates, multichannel pipets reduce handling time, hand strain, and the risk of error. Electronic pipets are especially useful both for mixing liquids by pipetting and for quickly dispensing aliquots of master mix. Most of the protocol involves volumes of 10 µL or less except the final cleanup, so the Eppendorf Xplorer 0.5–10 µL and 5–100 µL models are sufficient, but a 200 or 300 µL multichannel pipet for the final ethanol wash can be manual rather than electronic.

Consumables

Note: There are few choices available for low-retention PCR tubes and microplates; we have made successful libraries in lower-quality vessels, such as 8-tube strips. Many brands of pipet tips are claimed to be low-retention but in our experience some, e.g. Biotix, are still so retentive that they visibly fail to aspirate small volumes accurately. We use the "10 µL XL graduated" TipOne RPT filter tips with our 2.5, 10, and 20 µL Eppendorf pipets and "200 µL graduated" with 100 and 200 µL Eppendorf pipets, which take "universal"-shape tips; for Rainin-shape tips, we have heard but not validated a recommendation for Thomas Scientific SHARP tips. Almost all low-retention tips used in this protocol are 10 µL size except the 100/200 µL tips used briefly for mixing by pipetting in the cleanup step, when low retention may not be as crucial. The 200/300 tips used for the final ethanol wash and 1000 µL tips used for preparing premixes do not need be low-retention.

Reagents

All chemicals must be molecular biology-grade (certified free of nucleic acids and nucleases) except where otherwise specified. Some products are available with better pricing on larger scales than the given catalog numbers.

Oligonucleotide dilutions

Resuspend lyophilized oligonucleotides in DNA storage buffer at about 100 μM. Optionally verify by UV spectrophotometry (measure 1/10 dilutions for better accuracy) that the yields are within 10% of expectations.

Reagent premixes

To simplify the routine protocol, premix some of the reagents in large batches. Make sure to fully thaw, vortex, and (if possible) briefly centrifuge all reagents before mixing them, then vortex again to mix and centrifuge the final solution. Store the mixes at –20 °C.

Oligo Mix

samples:
µL
µL
µL
µL
total: µL

Adjust the volumes of adapter stocks used here according to their measured concentrations (see Oligonucleotide dilutions).

Synthesis Buffer

samples:
µL
µL
µL
µL
µL
µL
total: µL

PEG solution is viscous; slowly reverse-pipet it.

Synthesis Enzymes

samples:
µL
µL
µL
total: µL

PCR primers: dilute each pair together to 3 µM each primer in TE+Tween.

Thermal cycler programs

Note: The final holds are safe stopping points and the reaction products are stable, so you can choose a final holding temperature to minimize the instrument's energy consumption and fan noise. For example, 14 °C is the closest to room temperature at which an Applied Biosystems Veriti disables its heated lid. If you are nearby when the program finishes, you can transfer the samples to a refrigerator in order to power off the thermal cycler until you are ready for the next step.

Program 1, Digestion

Reaction volume: 5 µL

Program 2, Synthesis

Reaction volume: 10 µL

The number of PCR cycles is determined by the formula and may vary from one experiment to another.

Sample preparation

Genomic DNA must be free of histones and other proteins, e.g. isolated by a protocol that includes proteinase K treatment. Preferably it should not be incubated above 70 °C, to keep it double-stranded and avoid GC-content bias. DNA is best stored in a slightly basic buffer and a surfactant improves pipetting accuracy (see "Reagents"). This protocol generates successful libraries from formalin-fixed, paraffin-embedded (FFPE) samples but has not been biologically validated for that application.


Procedure

Digestion

Digestion Mix

reactions:
µL
µL
µL
µL
total: µL

Note: The water volume may be reduced to use a greater volume of gDNA; change the aliquot volumes accordingly.

Safe stopping point: After digestion, the samples can stay in the thermal cycler or a refrigerator for a few days before you continue to library synthesis.

Library synthesis

Synthesis Mix

reactions:
µL
µL
total: µL

Safe stopping point: After digestion, the samples can stay in the thermal cycler or a refrigerator for a few days before you continue to cleanup.

Cleanup

Fresh 80% ethanol

reactions:
mL
mL
total: mL

Note: Certified molecular biology-grade water is not required for the 80% ethanol. Low-retention pipet tips are not required for the next steps, until resuspending the pellet.

Note: Perform the remaining steps quickly, until the pellet is resuspended, to prevent the beads from overdrying. If you have a large number of samples, perform the next steps on only half of them before coming back to this point for the other half.


Next steps

Quantification and validation of the libraries

The final yield should be 10 μL of amplified library between 50 and 200 nM, with most of the fragments between 150 and 400 bp. The size distribution can be verified by running the library undiluted with a TapeStation High Sensitivity D1000 kit or equivalent. The electropherogram may show evidence of overamplification: a secondary bump or especially wide smear of molecules that migrate more slowly than the rest, because they comprise complementary annealed adapters and noncomplementary, unannealable inserts. These libraries can still be sequenced, but this artifact makes molarity and size-distribution estimates inaccurate, and it is ideal to recalibrate the PCR cycles to the maximum number that does not produce this artifact.

To determine pooling volumes (if using cleanup for individual samples) and optimize loading concentrations, it is helpful to measure the concentration more precisely by qPCR, which measures sequenceable molecules rather than total DNA content. Roche's KAPA Library Quantification Kits are designed for this purpose.

Sequencing

A good minimum target for the human genome is at least 40 million read pairs per library, e.g. up to 96 libraries on an Illumina NovaSeq S2 flow cell. Long reads (> 50 nt) are not useful as this method counts fragments, not bases, but paired-end reads are useful to count both ends of each fragment (use 2x50 on NovaSeq and NextSeq 1000/200, 2x38 on NextSeq 500). Be sure to include the correct index read lengths for your indexing scheme (8 nt i7 + 8 nt i5 for combinatorial dual indexing, 10 nt i7 + no i5 for i7-only unique single indexing based on IDT/Illumina UDIs). No custom sequencing primers are required as the libraries have standard Nextera adapters. ΦX174 spike-in is not required, but a 1% spike-in is recommended in case the instrument fails and troubleshooting is required.

From the MspJI restriction motif, more than 50% of bases at position 17 in both reads will be G and more than 50% at position 14 will be C or T. This does not interfere with sequencing performance. The organism's methylation motif will also be reflected here, e.g. a human genome with CpG methylation will also be more than 50% C at position 16. GC content throughout the reads may also be higher than across the entire genome if DNA methylation is enriched in GC-rich regions, as in the human genome.

Data processing

The sequence reads can be aligned to a reference genome with any standard aligner. The restriction digestion fragments DNA at a small number of specific positions, so duplicate fragments will frequently occur by chance; do not remove duplicates.

Suggested pipeline (implemented in FMLtools):

  1. Demultiplex pooled libraries with Illumina software.
  2. Trim Nextera adapter sequences with CutAdapt.
  3. Align reads to the reference genome with bwa-mem2.
  4. Sort and index alignments with Samtools.
  5. Count hits per motif site with count_fml_hits.py.
  6. Aggregate hits per genome region with region_counts.py.
  7. Analyze read counts with DESeq2.

Required one-time preparation:

  1. Index the reference genome sequence with the aligner.
  2. Index motif sites with get_sequence_positions.py.
  3. Obtain a list of genome regions of interest (e.g. promoters) in BED format.

Oligonucleotide designs (Illumina-specific)

Oligonucleotide sequences © 2021 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited.

All sequences are in the order 5′→3′ and all nucleotides are DNA. All cytosines are unmethylated.

Abbreviations

Library synthesis adapters

Use HPLC purification. This reduces the complexity of the random bases, but that is less important than the purity of full-length molecules. These oligos form secondary structures in moderate salt conditions but that should not affect spectrophotometry in storage buffer. Although they are used for ligation, the adapters must not have a 5′ phosphate; synthesis without it is standard so simply do not request the optional 5′ phosphate.

Indexing PCR primers

Use IDT's Ultramer synthesis or HPLC purification. Choose any indexing scheme and order Nextera-compatible DNA primers using the provided designs, keeping most of the sequence constant and varying only the index. Combinatorial dual indexing is inexpensive but requires careful arrangement of the index pairs. Alternatively, use a P5 primer with no index and only the i7 indexes from the IDT/Illumina unique dual indexing scheme for Nextera, which is more expensive but less difficult to prepare and saves sequencing cycles that can be used to increase the read lengths for the insert instead. Unique dual indexing is not likely to be helpful for FML-seq. TruSeq primers are not compatible and index sequences validated with TruSeq adapters are not recommended for these Nextera adapters.


Quick reference protocol

reactions:

Digestion

Digestion Mix: 4 µL + 1 µL gDNA, then start Program 1

µL
µL
µL
µL
total: µL

Synthesis

Synthesis Mix: add 4 µL + 1 µL primers, then start Program 2

µL
µL
total: µL

Cleanup

Fresh 80% ethanol

mL
mL
total: mL