Chapter Four - 4C Technology: Protocols and Data Analysis

https://doi.org/10.1016/B978-0-12-391938-0.00004-5Get rights and content

Abstract

Chromosome conformation capture (3C) technology and its genome-wide derivatives have revolutionized our knowledge on chromatin folding and nuclear organization. 4C-seq Technology combines 3C principles with high-throughput sequencing (4C-seq) to enable for unbiased genome-wide screens for DNA contacts made by single genomic sites of interest. Here, we discuss in detail the design, application, and data analysis of 4C-seq experiments. Based on many hundreds of different 4C-seq experiments, we define criteria to assess data quality and show how different restriction enzymes and cross-linking conditions affect results. We describe in detail the mapping strategy of 4C-seq reads and show advanced strategies for data analysis.

Introduction

Grasping the size of our genome in relation to that of the cell nucleus, which contains it, is completely impossible. The three billion base pairs of nucleotides that form the genome, typically indicated by the letters G, A, T, and C, spell out a sentence of 9000 km if using the average times roman font size 12. In reality, even the best microscope cannot visualize single nucleotides, let alone read the code. Still, if you would stretch the genome, it would measure an impressive 2 m. This 2 m of DNA is typically stored in a nucleus with a diameter of roughly 10 μm. Adding up all DNA in your head can therefore form a rope that sizes 40,000 times the distance from the earth to the moon.

While incomprehensible, these dimensions do teach us that a tremendous degree of packing, bending, and twisting is required to fit the genome in a cell nucleus. Until a decade ago, we had little understanding of the details of the shape of our genome. The development of chromosome conformation capture (3C) technology (Dekker, Rippe, Dekker, & Kleckner, 2002), however, changed our ability to study genome topology, and consequently altered our view on genome function and nuclear organization. 3C Experiments demonstrated that remote mammalian enhancers physically loop to the genes they control (Tolhuis, Palstra, Splinter, Grosveld, & de Laat, 2002). In addition, it has been shown that transcription factors are responsible for the formation of chromatin loops between genes and regulatory DNA sequences (Drissen et al., 2004, Spilianakis and Flavell, 2004), and that other proteins exist that bring together genomic sites, which are not genes (Hadjur et al., 2009, Splinter et al., 2006).

3C Technology is referred to as a one-versus-one method as it analyzes interactions between selected pairs of DNA sites. The strategy relies on formaldehyde cross-linking of proteins to proteins and to DNA, the subsequent digestion of cross-linked DNA by restriction enzymes (REs), the ligation of cross-linked DNA fragments, and quantitative frequency assessment of selected ligation junctions by PCR. If two distal sites on the linear chromosome form more ligation junctions with each other than with intervening sequences, a chromatin loop is demonstrated to exist between these sites in vivo (Dekker, 2006, Simonis et al., 2007).

The availability of 3C technology triggered the further development of genome-scale variants thereof. These include 4C technology (one-versus-all) (Simonis et al., 2006, Zhao et al., 2006), 5C technology (many-versus-many) (Dostie et al., 2006), ChIA-PET (an approach combining chromatin immunoprecipitation (ChIP) and 3C) (Fullwood et al., 2009), and Hi-C (all-versus-all) (Lieberman-Aiden et al., 2009). Each strategy has unique advantages and disadvantages, and the method of choice relies on the specific research question asked (de Wit and de Laat, 2012, van Steensel and Dekker, 2010). Collectively, they have been quickly adopted by the large community studying nuclear organization, gene regulation, and DNA replication, leading to an ever-increasing body of work showing how the shape of the genome affects its function (Ong and Corces, 2011, Splinter et al., 2011).

Here, we will focus on 4C technology, a technology designed to study in a detailed and unbiased manner the DNA contacts made across the genome by a given genomic site of interest. 4C Technology has been applied to demonstrate that individual gene loci can be engaged in many long-range DNA contacts with loci elsewhere on the same chromosome and on other chromosomes (Simonis et al., 2006). It confirmed at a much higher resolution, observations made earlier by microscopy that active and inactive chromatin separately in the nucleus (Noordermeer, Leleu, et al., 2011, Simonis et al., 2006, Splinter et al., 2011). In Drosophila, 4C was used to demonstrate that genes are bound by polycomb group (PcG) proteins, and are far apart on the chromosome, frequently meet in nuclear space (Bantignies et al., 2011, Tolhuis et al., 2011). Long-range interactions among conserved noncoding sequences were identified by 4C (Robyr et al., 2011), and it was applied to demonstrate that regulatory sequences cannot autonomously decide where to go in the nucleus; for their overall spatial positioning, they instead rely on their chromosomal context (Hakim et al., 2011, Noordermeer, de Wit, et al., 2011). 4C Technology and variants thereof have not only been applied to identify contacts between larger genomic regions, but now also started to be used for the identification of more local and defined interactions between regulatory sequences (Lower et al., 2009, Montavon et al., 2011, Soler et al., 2010). A robust standard protocol for this still seems lacking though. Finally, 4C not only allows the detection of three-dimensional DNA contacts, it also, and primarily, picks up sequences that are close in space because they happen to be proximal on the linear chromosome template. This realization has lead to an unexpected application of 4C technology in molecular diagnostics, as a robust strategy for the identification and fine-mapping of balanced and unbalanced chromosomal rearrangements near sites of interest. In leukemia, for example, 4C already enabled the detection of multiple novel translocations and inversions and the discovery of several new oncogenes (Homminga et al., 2011, Simonis et al., 2009).

Based on many hundreds of different 4C experiments, most of them unpublished, we will explain how to design, perform, and analyze 4C experiments, and how to judge the quality.

Section snippets

Principles of 4C technology

4C technology enables the identification of all regions in the genome that contact a genomic site of interest. We refer to such sites of interest as “viewpoints” or “baits,” while contacting regions that are cross-linked and ligated to the viewpoint are called “captures.” An outline of 4C technology is given in Fig. 4.1. In brief, cells are treated with formaldehyde, which cross-links proteins to proteins and to DNA. Cross-linked chromatin is subsequently digested with a primary RE that creates

Primer design 4C sequencing

The RE fragends captured by the viewpoint are amplified by an inverse PCR. The primers are designed outward on the viewpoint (Fig. 4.2A). To circumvent the need for further library preparation steps necessary for Illumina sequencing, the primers are designed with 5′ overhangs encoding the Illumina single-end sequence adapter P5 and P7. As a result of this strategy, each read from the sequencer first shows the PCR primer sequence (i.e., the part complementary to the viewpoint) and then the

High-Throughput Sequencing of 4C PCR Products

Illumina's Genome Analyzer II and HiSeq 2000 can both be used to sequence the 4C-seq libraries. The current HiSeq 2000 generates more than 100 million 4C-seq reads, filtered for high quality, in a single lane. However, 4C-seq can generate specific sequencing problems. One major concern is the possibility of an identical nucleotide in all reads at the same position or cycle. This occurs when all the reading primers are of the same length and ending all with the same restriction site sequence,

Mapping of 4C-seq reads to genome

The mapping of 4C-seq reads onto a reference genome is different compared to that of other next generation sequencing applications. We have developed a custom analysis pipeline written in perl to process the 4C-seq data. The first step of mapping the data is to bin the reads according to the reading primers and barcodes used in each lane. Usually, more than 90% (median 96% N: 49 lanes) of the reads that passed the Illumina quality filter can be binned. Each bin represents a single 4C-seq

Acknowledgments

This work was financially supported by grants from the Dutch Scientific Organization (NWO) (91204082 and 935170621) and a European Research Council Starting Grant (209700, “4C”)

References (37)

  • R. Drissen et al.

    The active spatial organization of the beta-globin locus requires the transcription factor EKLF

    Genes & Development

    (2004)
  • M.J. Fullwood et al.

    An oestrogen-receptor-alpha-bound human chromatin interactome

    Nature

    (2009)
  • P.G. Giresi et al.

    FAIRE (formaldehyde-assisted isolation of regulatory elements) isolates active regulatory elements from human chromatin

    Genome Research

    (2007)
  • S. Hadjur et al.

    Cohesins form chromosomal cis-interactions at the developmentally regulated IFNG locus

    Nature

    (2009)
  • O. Hakim et al.

    Diverse gene reprogramming events occur in the same spatial clusters of distal regulatory elements

    Genome Research

    (2011)
  • W.J. Kent et al.

    The human genome browser at UCSC

    Genome Research

    (2002)
  • M. Krzywinski et al.

    Circos: An information aesthetic for comparative genomics

    Genome Research

    (2009)
  • E. Lieberman-Aiden et al.

    Comprehensive mapping of long-range interactions reveals folding principles of the human genome

    Science

    (2009)
  • Cited by (0)

    View full text