Skip to main content
Advertisement

Main menu

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Author Interviews
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Journal of Human Immunity
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research

User menu

  • My alerts

Search

  • Advanced search
Life Science Alliance
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Journal of Human Immunity
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research
  • My alerts
Life Science Alliance

Advanced Search

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Author Interviews
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Follow LSA on Bluesky
  • Follow lsa Template on Twitter
Research Article
Transparent Process
Open Access

Expression regulation of genes is linked to their CpG density distributions around transcription start sites

View ORCID ProfileHao Tian, Yueying He, Yue Xue, View ORCID ProfileYi Qin Gao  Correspondence email
Hao Tian
1Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Roles: Conceptualization, Formal analysis, Visualization, Methodology, Writing—original draft, review, and editing
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hao Tian
Yueying He
1Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Roles: Conceptualization, Formal analysis, Visualization, Methodology, Writing—original draft
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yue Xue
1Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Roles: Conceptualization, Formal analysis, Visualization, Methodology, Writing—original draft
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yi Qin Gao
1Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
2Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
3Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
Roles: Conceptualization, Supervision, Funding acquisition, Methodology, Writing—review and editing
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yi Qin Gao
  • For correspondence: gaoyq@pku.edu.cn
Published 17 May 2022. DOI: 10.26508/lsa.202101302
  • Article
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF
Loading

Article Figures & Data

Figures

  • Supplementary Materials
  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1. Gene classification based on CpG distribution.

    (A) Overview of the network-training process. (B) The distribution of CG likelihood of human genes. (C) The CpG density distribution of genes belonging to different clusters (left, heatmap; right, average behaviors). (D) The tissue specificity of genes of three clusters. Based on the definition of tissue specificity (see the Materials and Methods section), for each gene, its maximum tissue specificity among 37 tissues (testis was not considered because it contains too many tissue-specific genes) was extracted and drawn here. P-values = 3.12 × 10−69 (cluster 1 versus cluster 2), 6.37 × 10−276 (cluster 1 versus cluster 3), and 4.28 × 10−106 (cluster 2 versus cluster 3) by Welch’s unequal variance t test. (E) The nucleosome occupancy patterns (measured by MNase-seq) of genes of three clusters in GM12878. (F) The distribution of genomic distance between transcription factor (TF)–binding sites and gene transcription starting site. Positive and negative values indicate that TFs bind to the regions downstream and upstream of transcription starting site, respectively. P-value = 2.18 × 10−9 for cluster 1 and 2, P-value = 0.04 for cluster 1 and 3. Welch’s unequal variance t test. (G) The CpG density distribution of c1-HKG and c1-TSG (left, heatmap; right, average behaviors). (H) The proportion of c1-HKGs/h-c1-TSGs that bind to one certain TF. Each data point represents one certain TF, and the corresponding color represents its tissue specificity. P-value = 4.12 × 10−11 by t test.

  • Figure S1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S1. Gene classification based on CpG distribution.

    (A) The distribution of tissue-specific genes of one certain tissue among three clusters. (B) The distribution of genomic distance between transcription factor (TF)–binding sites and gene transcription starting site. Positive and negative values indicate that TFs bind to the regions downstream and upstream of transcription starting site, respectively. (C) The proportion of h-c1-TSGs/h-c3-TSGs that bind to one certain TF. Each data point represents one certain TF, and the corresponding color represents its tissue specificity. (D) The comparison of the insulation score between c1-HKGs and c1-TSGs in the liver. Window size we used here is 480 kb (see the Materials and Methods section). P-value = 9.4054 × 10−5 by Welch’s unequal variance test. (E) The overlap between gene clusters identified here and by Weber et al (2007). (F) The proportion of genes whose promoter regions overlap with CGI in each cluster. (G) The proportion of genes whose promoter regions overlap with the liver nonmethylated island. (H) The CpG density distribution of housekeeping genes and tissue-specific genes within each cluster identified by RNN and Weber et al (2007). (I) The SD of tissue specificity of each cluster identified by RNN and Weber et al (2007).

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2. The relation between sequence property and gene expression.

    (A) The Spearman correlation between gene expression and CpG density of each nonoverlapping 40-bp window (see the Materials and Methods section); the expression level of three gene clusters in these four cells could be seen in Fig S6. For each cell, P-value < 0.001 when two correlation curves were compared (t test). (B) The Spearman correlation between gene expression and TpG density of each nonoverlapping 40-bp window. For each cell, P-value < 0.001 when two correlation curves were compared (t test).

  • Figure S2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S2. The Spearman correlation between the gene expression level and CpG density for three clusters in 10 human tissues.

    Ten subplots correspond to 10 tissues (liver, cortex BA9, hippocampus, lung, left ventricle, spleen, ovary, adrenal, aorta, and pancreas).

  • Figure S3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S3. The Spearman correlation between the gene expression level and CpG density in tumor cells and corresponding paracancerous (normal) cells.

    (A, B, C) represent the correlation levels in clusters 1, 2 and 3, respectively. Each subplot represents one tumor (and corresponding normal tissue) type. The expression data were downloaded from TCGA.

  • Figure S4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S4. The Spearman correlation between the gene expression level and TpG density in 10 human tissues.

    (A, B, C) represent the correlation levels in clusters 1, 2 and 3, respectively. Each subplot represents one human tissue.

  • Figure S5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S5. The Spearman correlation between the gene expression level and TpG density in tumor cells and corresponding paracancerous cells.

    (A, B, C) represent the correlation in clusters 1, 2 and 3, respectively. Each subplot represents one tumor (and corresponding normal tissue) type. The expression data were downloaded from TCGA.

  • Figure S6.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S6. The expression level of three gene clusters in two-cell, eight-cell, hES, and ICM.

    Each subplot represents one cell. Genes in clusters 1 and 3 possess the highest and smallest expression level, respectively.

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3. Distinct regulatory mechanisms among three gene clusters.

    (A, B) The distribution of H3K27me3 (A) and H3K4me3 (B) among three gene clusters in the liver. Each heatmap was ranked based on the gene expression level. ChIP-seq signal represents the fold change (ChIP-seq counts relative to control). (C, D) The Spearman correlation between the gene expression level and epigenetic marks (C) and between TpG density and H3K27me3 (D) (see the Materials and Methods section). For Fig 3C, all P-values < 10−7 and for Fig 3D, all P-values < 10−4 (t test).

  • Figure S7.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S7. Distinct regulatory mechanisms among three gene clusters.

    (A, B, C, D) The distribution of H3K9me3 (A), H3K27ac (B), H3K36me3 (C), and DNA methylation (D) among three gene clusters in the liver. Each heatmap was ranked based on the gene expression level. (E) The Pearson correlation coefficient between the compartment index and gene expression level among different tissues. P-value = 1.16 × 10−6 for clusters 1 and 2; P-value = 0.0067 for clusters 2 and 3; Welch’s unequal variance test.

  • Figure S8.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S8. The Spearman correlation between gene expression level and epigenetic marks.

    (A, B, C) The Spearman correlation between the gene expression level and epigenetic marks in the lung (A), ovary (B), and sigmoid colon (C). Each subplot represents the correlation levels between expression and one epigenetic mark in three gene clusters.

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4. Expression and epigenetic changes in carcinogenesis.

    (A) The expression level (TPM) of genes of different clusters in cancer and normal samples (upper) and the log2 (expression fold change) in carcinogenesis calculated by DESeq2 (down). (B) The proportion of DE genes (in carcinogenesis) in three clusters. Upper: the proportion of up-expressed genes, down: the proportion of down-expressed genes. (C) The CpG density distribution of up-expressed, down-expressed, and other genes. COAD (colon cancer) was used here for illustration. See Fig S12 for more instances. (D) The DNA methylation level of genes of different clusters in normal (left) and tumor (middle) cells and the methylation changes during carcinogenesis for three gene clusters (right). COAD was used for illustration. See Figs S14 and S15 for more instances.

  • Figure S9.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S9. The expression fold change of genes during tumorigenesis.

    (A) The volcano plots obtained from DESeq2 analysis without shrinkage (the upper figure) and with shrinkage (the bottom figure). (B) Two examples (SLC16A1 and BSG) belonging to cluster 1 and up-regulated in a variety of cancer types. (C) The boxplots for expression fold change after shrinkage.

  • Figure S10.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S10. The variation of expression level in carcinogenesis.

    (A) The expression level (TPM) of tissue-specific genes (belonging to the corresponding normal sample) in normal and cancer samples, and the log2 (expression fold change) in carcinogenesis calculated by DESeq2. (B) The expression level (TPM) of complementary tissue-specific genes in normal and cancer samples and the log2 (expression fold change) in carcinogenesis calculated by DESeq2.

  • Figure S11.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S11. The proportion of DE genes (in carcinogenesis) identified based on shrunken fold changes in three clusters for various cancer types.

    Upper: the proportion of up-expressed genes and down: the proportion of down-expressed genes.

  • Figure S12.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S12. The CpG density distribution of up-expressed, down-expressed, and other genes.

    (A, B, C, D, E) represent the BRCA, LIHC, LUAD, LUSC, and STAD, respectively. Each subplot represents the CpG density distributions of up-expression, down-expressed, and stable genes.

  • Figure S13.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S13. The GO functions of up-expressed and down-expressed genes of different clusters.

    (A, B) COAD (A) and BLCA (B) were used here for illustration.

  • Figure S14.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S14. The variation of epigenetic marks in carcinogenesis.

    (A) Upper: the H3K27me3, H3K9me3, and H3K4me3 of up-expressed, down-expressed, and other genes within cluster 2 in corresponding normal cells. Down: the DNA methylation level of up-expressed, down-expressed, and other genes within cluster 2 in normal and tumor cells and the methylation change during carcinogenesis. (B) The DNA methylation level of genes of different clusters in normal (left) and tumor (middle) cells and the methylation changes during carcinogenesis for three gene clusters (right).

  • Figure S15.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S15. The variation of DNA methylation level in carcinogenesis.

    (A, B, C) The DNA methylation level of genes of different clusters in normal (left) and tumor (middle) cells and the methylation changes during carcinogenesis for three gene clusters (right). (D) Three typical examples (GATA4, HOXD11, and HOXD12) belonging to cluster 2 and hypermethylated around transcription starting site.

  • Figure S16.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S16. Comparison of correlation patterns (between epigenetic marks and the expression level) between clusters identified in this study and by Weber et al (2007).

    Each subplot represents the correlation levels between expression and one epigenetic mark.

  • Figure S17.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S17. The CpG density of housekeeping transcription factor (TF)–binding sites and tissue-specific TF-binding sites.

    Compared with tissue-specific TFs, housekeeping TFs tend to bind to regions with higher CpG density.

Supplementary Materials

  • Figures
  • Table S1. Data sources. [LSA-2021-01302_TableS1.xlsx]

PreviousNext
Back to top
Download PDF
Email Article

Thank you for your interest in spreading the word on Life Science Alliance.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Expression regulation of genes is linked to their CpG density distributions around transcription start sites
(Your Name) has sent you a message from Life Science Alliance
(Your Name) thought you would like to see the Life Science Alliance web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Gene classification based on semi-supervised neural network
Hao Tian, Yueying He, Yue Xue, Yi Qin Gao
Life Science Alliance May 2022, 5 (9) e202101302; DOI: 10.26508/lsa.202101302

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Gene classification based on semi-supervised neural network
Hao Tian, Yueying He, Yue Xue, Yi Qin Gao
Life Science Alliance May 2022, 5 (9) e202101302; DOI: 10.26508/lsa.202101302
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
Issue Cover

In this Issue

Volume 5, No. 9
September 2022
  • Table of Contents
  • Cover (PDF)
  • About the Cover
  • Masthead (PDF)
Advertisement

Jump to section

  • Article
    • Abstract
    • Introduction
    • Results
    • Discussion
    • Materials and Methods
    • Data Availability
    • Acknowledgments
    • References
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF

Subjects

  • Systems & Computational Biology

Related Articles

  • No related articles found.

Cited By...

  • CpG Island Definition and Methylation Mapping of the T2T-YAO Genome
  • Promoter-adjacent DNA hypermethylation can downmodulate gene expression: TBX15 in the muscle lineage
  • Google Scholar

More in this TOC Section

  • Ocular blood flow reduction promote RGC loss
  • AURKA phosphorylation of HEC1 in oocytes
  • SLC38A9 in Tat-induced senescence
Show more Research Article

Similar Articles

EMBO Press LogoRockefeller University Press LogoCold Spring Harbor Logo

Content

  • Home
  • Newest Articles
  • Current Issue
  • Archive
  • Subject Collections

For Authors

  • Submit a Manuscript
  • Author Guidelines
  • License, copyright, Fee

Other Services

  • Alerts
  • Bluesky
  • X/Twitter
  • RSS Feeds

More Information

  • Editors & Staff
  • Reviewer Guidelines
  • Feedback
  • Licensing and Reuse
  • Privacy Policy

ISSN: 2575-1077
© 2025 Life Science Alliance LLC

Life Science Alliance is registered as a trademark in the U.S. Patent and Trade Mark Office and in the European Union Intellectual Property Office.