Skip to main content
Advertisement

Main menu

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Archive
    • Subjects
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why Submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research

User menu

  • My alerts

Search

  • Advanced search
Life Science Alliance
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research
  • My alerts
Life Science Alliance

Advanced Search

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Archive
    • Subjects
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why Submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Follow lsa Template on Twitter
Resource
Transparent Process
Open Access

A comprehensive resource for retrieving, visualizing, and integrating functional genomics data

Matthias Blum, Pierre-Etienne Cholley, Valeriya Malysheva, View ORCID ProfileSamuel Nicaise, Julien Moehlin, Hinrich Gronemeyer  Correspondence email, View ORCID ProfileMarco Antonio Mendoza-Parra  Correspondence email
Matthias Blum
1Department of Functional Genomics and Cancer, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labelisée Ligue Contre le Cancer, Illkirch, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pierre-Etienne Cholley
1Department of Functional Genomics and Cancer, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labelisée Ligue Contre le Cancer, Illkirch, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Valeriya Malysheva
1Department of Functional Genomics and Cancer, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labelisée Ligue Contre le Cancer, Illkirch, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Samuel Nicaise
1Department of Functional Genomics and Cancer, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labelisée Ligue Contre le Cancer, Illkirch, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Samuel Nicaise
Julien Moehlin
1Department of Functional Genomics and Cancer, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labelisée Ligue Contre le Cancer, Illkirch, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hinrich Gronemeyer
1Department of Functional Genomics and Cancer, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labelisée Ligue Contre le Cancer, Illkirch, France
2Centre National de la Recherche Scientifique, UMR7104, Illkirch, France
3Institut National de la Santé et de la Recherche Médicale, U1258, Illkirch, France
4Université de Strasbourg, Illkirch, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: hg@igbmc.fr
Marco Antonio Mendoza-Parra
1Department of Functional Genomics and Cancer, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labelisée Ligue Contre le Cancer, Illkirch, France
2Centre National de la Recherche Scientifique, UMR7104, Illkirch, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marco Antonio Mendoza-Parra
  • For correspondence: mmendoza@genoscope.cns.fr
Published 9 December 2019. DOI: 10.26508/lsa.201900546
  • Article
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF
Loading

Article Figures & Data

Figures

  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1. Schematic illustration of the qcGenomics workflow.

    An automated web routine identifies and indexes novel datasets available in GEO and SRA and stores such indexes in a database. A job scheduler dispatches tasks to computational nodes. A job consists of downloading SRA files from the SRA database, converting SRA files to FASTQ files, aligning raw reads, and controlling the quality of mapped reads. Before each NGS-QC database update, similarity indexes used by qcComparator are generated for all pairwise comparisons of experiments, and experiments are enriched with annotated metadata.

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2. Example of a comparative visualization with qcGenome browser NAVi of histone modification tracks and HiC-derived illustrations for the Sox2 region.

    Shown are the HiC-derived 3D chromatin organization illustrating the topologically associated domains; tracks for the enhancer/promoter-associated histone modifications H3K4me3, H3K4me1, and H3K27ac (track height 50 reads); and chromatin interaction loops from Promoter Capture Hi-C (PCHiC) experiments, illustrated as arcs. The displays are in mirror image order for human ESCs above (top panels) and ESC-derived neuroectodermal cells below (bottom panels) the genome coordinates and the genes in this region. The color code for the chromatin contact map given at the bottom.

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3. Example of a qcGenomics query illustrating the retrieval of datasets by NAVi, their visualization with the embedded genome browser, and the datasets comparisons on the basis of their global enrichment patterns by the tool qcComparator.

    (A) 1,986 human public datasets corresponding to ChIP-seq assays targeting the histone modification H3K27ac were retrieved with NAVi by the query “H3K27ac hg19”. This information is displayed in a table, where further elements, including the biological source and the total mapped reads, and also their associated global quality score are provided. (B) A subset of datasets retrieved in this query are visualized with the embedded NAVi genome browser. Here, local quality-controlled regions (local QC) are displayed as a heat map bar plot (orange gradient bars), accompanied by their read count enrichment pileups (green). Note the differential enrichment pattern for H3K27ac around the androgen receptor gene in LNCaP cells relative to that observed in brain tissue and the inverse when visualizing Neurogenin 2 (NEUROG2). (C) All retrieved datasets in (A) were compared by computing pairwise similarity indices (Tanimoto) and classified with the t-SNE strategy. This analysis provides a sample stratification, which was further correlated with their related cell/tissue origin. (D) A detailed view of two of the t-SNE–identified clusters, namely, LNCaP and MCF7/cancer–related cell lines (219 datasets) performed by qcComparator is displayed. Here, dataset similarity indexes (Tanimoto) are displayed in a heat map square matrix. Note that LNCaP and VCaP cell–related datasets–both originated from prostate cancers–present a higher degree of similarity when compared with other types of cancer cell lines.

  • Figure S1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S1. NAVi: an intuitive tool for retrieving public datasets qualified in qcGenomics.

    Example of a query performed for retrieving MCF7 related datasets assessed for the histone modification H3K4me3 and ATAC-seq (MCF7 H3K4me3 ChIP-seq + MCF7 ATAC-seq). As illustrated herein, queries can contain multiple terms and can be combined with a plus sign (“+”) to combine the results of multiple independent queries. As an outcome, NAVI provides a listing of all datasets matching with the query, accompanied by a certain number of elements, including the associated model organism, the type of biological source (cell/tissue type), the target molecule, the number of mapped reads, the quality score and a direct link to the original accession ID. NAVi provides the possibility to select retrieved datasets for the further analyses.

  • Figure S2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S2. Monitoring performance differences between ChIP-seq profiles.

    As profiles are precomputed, NAVi can be interrogated to rapidly display a profile at any genomic locus of interest to monitor the performance of different profiles and differences between them. The example shows four ChIP-seq profiles for the ERG TF binding to WNT7B in human RWPE-1 prostate epithelial cells. The top three ChIP-seq experiments were carried out with anti-ERG Clone 9FY Biocare no. CM421 C, whereas the bottom one used Epitomics antibody 2805-1. The red signal below a peak in the first intron of WNT7B indicates that this peak is stable when using the NGS-QC sub-sampling test. Numbers at the left indicate track heights in terms of cumulated mapped reads.

  • Figure S3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S3. Monitoring the gain of a mis-expressed TF in tumor vs normal cells.

    More than half of all prostate cancers harbor the TMPRSS2-ERG fusion. A comparison between the ChIP-seq profiles for ERG of TMPRSS2-positive human VCaP cells (GSM1328978 and GSM2086313) and “normal” human RWP1 prostate epithelial cells (GSM927071 and GSM2195110) reveals rapidly the gain of ERG binding in the vicinity of the WNT2 TSS. The anti-ERG antibodies used were Epitomics Cat. No. 2805-1 (GSM1328978 and GSM927071), anti-ERG Clone 9FY Biocare no. CM421 C (GSM2195110), and not specified for GSM2086313. Numbers at the left indicate track heights in terms of cumulated mapped reads.

  • Figure S4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S4. Monitoring combinatorial TF and chromatin modification events.

    Precomputed profiles (TF binding, epigenetic modification, chromatin accessibility, chromatin looping, RNA/global run-on with high-throughput sequencing [GRO-seq] profiles, etc.) can be searched and displayed for user-defined genomic loci within seconds. Here, a screenshot of NAVi shows the co-binding of the retinoic acid receptor-α (RARα), the estrogen receptor-α (ERα), histone acetyltransferase and co-activator P300, and the K4 methylation and K27 acetylation of histone H3 in MCF7 breast cancer cells at the RARA locus and its environment. Note that RARA, RAPGFL1, and IGFBP4 are known as classical estrogen receptor–responsive genes. Numbers at the left indicate track heights in terms of cumulated mapped reads.

  • Figure S5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S5. qcComparator analysis of 73 ATAC-seq profiles available in the NGS-QC database.

    NAVi was used to query for human ATAC-seq profiles corresponding to human CD34+ hematopoietic precursor cells and differentiated CD4+ and CD8+ cells. The query retrieved 12 CD34+, 58 CD4+, and 3 CD8+ profiles. Analysis with qcComparator yielded the above similarity landscapes in false color representation. Four “core” populations can be easily distinguished (framed), CD34+ (nine datasets) and CD8+ (three datasets) and two populations of CD4+ cells, a major population (lower right) comprising 31 datasets and a minor population of 17 datasets, which displays a different ATAC-seq landscape than the major population. As qcGenomics specifies the datasets, users can follow up such differences by additional experimentation or scrutiny of particular marks/TFs. Note that no selection (e.g., to quality of number or aligned reads) has been performed before the analysis.

  • Figure S6.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S6. Example of the similarity index matrix generated by the qcComparator.

    In addition of displaying the similarity index (Tanimoto or Dice metrics) levels (heat map), in the right side a panel provides the possibility to highlight either the target molecule, the cell/tissue, or the qcStamp associated with each of the displayed datasets.

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4. Chromatin state analysis over large number of public datasets with qcChromStater.

    (A) Query concerning several histone modification marks, the co-activator facto P300, and RNA-seq datasets for the MCF7 cell line has been performed with NAVi. This query identified 170 datasets which has been compared in a local context (500-nt bins) by performing a combinatorial analysis. (B) Schematic representation of the gene-centric analysis performed by qcChromStater over the combinatorial states retrieved genome-wide. (C) The combinatorial analysis identified 18 different states (State ID: s01-s18), which were further stratified on the basis of their gene coverage (TSS, Gene Body, TES, and proximal/distal regulatory element regions defined at 10 and 50 kb, respectively). In addition, the functional annotation fields were filled on the basis of the knowledge corresponding to the different combinatorial states (e.g., “active enhancer” for regions enriched for the histone modification mark H3K27ac; or “repressed gene” for those associated to the H3K27me3 modification). qcChromStater allows to retrieve the genes related to each of the chromatin states, which can be verified by displaying the enrichment patterns with the NAVi genome browser. (D) Visualization of the PSG gene cluster associated to the state “s03” (repressed gene) in (C). Note that the histone modification mark H3K27me3 is enriched over the entire cluster, confirming the chromatin status predicted by qcChromStater. (E) Visualization of the growth-regulating estrogen receptor binding 1 genomic region, associated to the stat “s04” (active promoter) in (C). Note the strong enrichment patterns for the RNA-seq, H3K27ac, EP300, and H3K4me3 datasets, whereas there are very weak signals for the repressive mark H3K27me3.

  • Figure S7.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S7. Gene table display as part of the qcChromStater tool.

    When user defines functional states information, it is possible to query for genes composing each of the defined functional states.

  • Figure S8.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S8. Genes associated to the combinatorial state “s03”.

    qcChromStater displays the annotated genes associated to the different combinatorial states, stratified by coverage. In the case of “s03” which corresponds to the repressive mark H3K27me3, >8,000 genes (identified by the enrichment of local QCs in their gene bodies) are retrieved, reflecting major repressed genomic regions. The subgroup of the carcinoembryonic antigens gene family comprising the PSG is highlighted in yellow.

PreviousNext
Back to top
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Life Science Alliance.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
A comprehensive resource for retrieving, visualizing, and integrating functional genomics data
(Your Name) has sent you a message from Life Science Alliance
(Your Name) thought you would like to see the Life Science Alliance web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
qcGenomics
Matthias Blum, Pierre-Etienne Cholley, Valeriya Malysheva, Samuel Nicaise, Julien Moehlin, Hinrich Gronemeyer, Marco Antonio Mendoza-Parra
Life Science Alliance Dec 2019, 3 (1) e201900546; DOI: 10.26508/lsa.201900546

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
qcGenomics
Matthias Blum, Pierre-Etienne Cholley, Valeriya Malysheva, Samuel Nicaise, Julien Moehlin, Hinrich Gronemeyer, Marco Antonio Mendoza-Parra
Life Science Alliance Dec 2019, 3 (1) e201900546; DOI: 10.26508/lsa.201900546
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
Issue Cover

In this Issue

Volume 3, No. 1
January 2020
  • Table of Contents
  • Cover (PDF)
  • About the Cover
  • Masthead (PDF)
Advertisement

Jump to section

  • Article
    • Abstract
    • Introduction
    • Results
    • Discussion
    • Materials and Methods
    • Acknowledgements
    • References
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF

Subjects

  • Systems & Computational Biology
  • Methods & Resources
  • Genomics & Functional Genomics

EMBO Press LogoRockefeller University Press LogoCold Spring Harbor Logo

Content

  • Home
  • Newest Articles
  • Current Issue
  • Archive
  • Subject Collections

For Authors

  • Submit a Manuscript
  • Author Guidelines
  • License, copyright, Fee

Other Services

  • Alerts
  • Twitter
  • RSS Feeds

More Information

  • Editors & Staff
  • Reviewer Guidelines
  • Feedback
  • Licensing and Reuse
  • Privacy Policy

ISSN: 2575-1077
© 2021 Life Science Alliance LLC

Life Science Alliance is registered as a trademark in the U.S. Patent and Trade Mark Office and in the European Union Intellectual Property Office.