Understanding transcriptional regulation by integrative analysis of transcription factor binding data

  1. Mark Gerstein1,2,11,12
  1. 1Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA;
  2. 2Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA;
  3. 3Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong;
  4. 4Program in Bioinformatics and Integrative Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01655, USA;
  5. 5Center for Genomic Regulation (CRG) and UPF, 08003 Barcelona, Spain;
  6. 6Genome Institute of Singapore, Singapore 138672;
  7. 7Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;
  8. 8RIKEN Omics Science Center, Yokohama Institute, Yokohama, Kanagawa 230-0045, Japan;
  9. 9European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, United Kingdom;
  10. 10Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA;
  11. 11Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA

    Abstract

    Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.

    Footnotes

    • 12 Corresponding author

      E-mail mark.gerstein{at}yale.edu

    • [Supplemental material is available for this article.]

    • Article and supplemental material are at http://www.genome.org/cgi/doi/10.1101/gr.136838.111.

      Freely available online through the Genome Research Open Access option.

    • Received December 21, 2011.
    • Accepted April 30, 2012.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

    Related Articles

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server