Skip to main content
Advertisement

Main menu

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Author Interviews
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Journal of Human Immunity
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research

User menu

  • My alerts

Search

  • Advanced search
Life Science Alliance
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Journal of Human Immunity
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research
  • My alerts
Life Science Alliance

Advanced Search

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Author Interviews
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Follow LSA on Bluesky
  • Follow lsa Template on Twitter
Resource
Transparent Process
Open Access

Gastric cancer genomics study using reference human pangenomes

View ORCID ProfileDu Jiao, Xiaorui Dong, Shiyu Fan, View ORCID ProfileXinyi Liu, View ORCID ProfileYingyan Yu  Correspondence email, View ORCID ProfileChaochun Wei  Correspondence email
Du Jiao
1Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
Roles: Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Du Jiao
Xiaorui Dong
1Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
Roles: Software, Formal analysis, Validation, Investigation, Visualization, Methodology
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shiyu Fan
1Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
Roles: Data curation, Software, Formal analysis, Validation, Investigation, Methodology
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xinyi Liu
1Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
Roles: Data curation, Formal analysis, Investigation, Methodology
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xinyi Liu
Yingyan Yu
2Department of General Surgery of Ruijin Hospital, Shanghai Institute of Digestive Surgery, and Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Roles: Conceptualization, Supervision, Funding acquisition, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yingyan Yu
  • For correspondence: yingyan3y@sjtu.edu.cn
Chaochun Wei
1Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
Roles: Conceptualization, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chaochun Wei
  • For correspondence: ccwei@sjtu.edu.cn
Published 27 January 2025. DOI: 10.26508/lsa.202402977
  • Article
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Supplementary Materials
  • Figure S1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S1. Construction of GGCPan and comparison of mapping rate using three different reference genomes.

    (A) Construction pipeline of gastric cancer graph pangenome GGCPan. (B) Histogram of the distribution of the number of SVs detected in the 185 samples. The SVs are detected by paftools.js based on minimap2 alignment results and are applied to construct the GGCPan. (C) Pipeline of non-reference sequences of GCPan aligned to GGCPan. (D) Read mapping rates of 185 gastric tumor samples on three reference genomes.

  • Figure S2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S2. Performance comparison of structural variant detection using three different reference genomes.

    (A) Comparison of the performance of different tools using GRCh38 as the reference for structural variation detection using simulated data with different sequencing depths. (B) Effect of the completeness of the graph-modeled pangenome on its performance in detecting structural variants. The x-axis represents the number of samples to construct the graph-modeled pangenome. The five samples used for evaluation were excluded from the samples used to construct the five graph pangenomes. (C) Flowchart of the evaluation of different reference genomes and variant identification tools using sequencing data from the GIAB HG002 sample. Different colors represent different identification pipelines. (D) Performance evaluation results for variant identification using different reference genomes.

  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1. Performance of structural variant detection using three different reference genomes.

    (A) Comparison of the performance of structural variant detection using three different reference genomes in simulated data. (B) Number of somatic structural variants detected using three reference genomes in real sequencing data from 185 patients. (C) Comparison of SVs detected using GRCh38 and GGCPan in 185 patients. (D) Comparison of SVs detected using GCPan and GGCPan in 185 patients. (C, D) “+” stands for presence and “−” for absence in (C, D). (E) Enriched pathways for SV-related genes. The SVs are detected using GGCPan in 185 samples. The size of the dot represents the number of related genes included in the pathway.

  • Figure S3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S3. Comparison of structural variants in simulated data (SimuA) using GCPan and GRCh38 as the references.

    (A) Overlap of GRCh38-based SVs and GCPan-based SVs in the five simulated samples (SimuA). (B) Example of insertion that was detected in GRCh38 but not in GCPan. (C) Example of deletion that was detected using GRCh38 but not using GCPan. (B, C) Gray bars in (B, C) represent the alignment of reads at this position when using GRCh38 and GCPan as the reference genomes, respectively.

  • Figure S4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S4. Population frequency of SVs detected in 185 samples using GRCh38, GCPan, and GGCPan.
  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2. Comparison of numbers and types of small variants detected using three different reference genomes.

    (A) Numbers of SNPs and indels detected in 185 patients with gastric tumors. Transitions and transversions are subtypes of SNPs. Insertion and deletion are subtypes of indels. (B) Numbers of different functional types of small variants (SNP, indel) detected based on the three reference genomes. (C) Left histograms: numbers and types of small variants (SNP, indel) detected in the three reference genomes in 185 patients; right histogram: types and numbers of variants in genes with mutation rates ranked top 10. The numbers at the top of the histogram represent the mutation rate. Top, middle, and bottom represent results using GRCh38, GCPan, and GGCPan as the reference genomes, respectively.

  • Figure S5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S5. There was little difference among results using different reference genomes on the detection of small variants, tumor mutation burden (TMB), and microsatellite instability (MSI).

    (A) 23 genes with mutation rates differing by more than 5% in 185 samples using the three reference genomes. The y-axis represents the number of mutations per gene in 185 samples. (B) TMBs in different cohorts. The three bolded black cohorts are our gastric cancer data using three reference genomes. (C) Results of correlation tests between TMB and MSI with each phenotype. Continuous variable phenotypes (e.g., age and tumor diameter) were subjected to Spearman’s correlation test using calculated values of TMB and MSI, and other types of phenotypes were subjected to Fisher’s exact test using state values of TMB and MSI (TMB-H/TMB-L, MSI-H/ MSI-L) for Fisher’s exact test. Each grid color corresponds to the negative logarithmic value of the P-value of the correlation test. In the figure, “*” indicates that the P-value is between 0.05 and 0.01, “**” indicates that the P-value is between 0.01 and 0.001, and “***” indicates that the P-value is less than 0.001, and unlabeled positions indicate that the correlation is not significant. (D) Sample distribution of TMB-H, TMB-L and MSI-H, MSI-L/MSS in the subtypes of location, Borrmann, and Lauren.

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3. Comparison of candidate driver genes detected using different reference genomes.

    (A) Candidate driver genes of gastric cancer detected using the three reference genomes. The left bar graph shows the −log10(q) value of each gene, and the “*” next to a gene name indicates that the gene was determined as a driver gene using this reference genome. The q-value here stands for the significance of the gene being identified as a driver gene. The right bar graph represents the number of mutations and mutation types for each gene. The upper bar graph represents the TMB values of each sample using the three different reference genomes. (B) Enriched pathways related to the candidate driver genes. The significance threshold for enrichment analysis was P < 0.05. Numbers in parentheses represent that the gene was identified as a driver gene using the corresponding reference genome. “1” represents GRCh38, “2” represents GCPan, and “3” represents GGCPan. (C) Overlap of the candidate driver genes detected using the three reference genomes. “#” indicates that three of the four genes are related to cancers in previous studies. “##” indicates that this gene is related to cancers in previous studies. “###” indicates that this gene is related to cancers in previous studies.

  • Figure S6.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S6. Correlation between the mutation status of these 24 significantly mutated genes in the cohort (yes/no mutation) and the clinical phenotype of the 185 patients.

    (A) Genes significantly related to phenotypes using GRCh38. (B) Genes significantly related to phenotypes using GCPan. (C) Genes significantly related to phenotypes using GGCPan. In the figure, “*” indicates that the P-value is between 0.05 and 0.01, “**” indicates that the P-value is between 0.01 and 0.001, and “***” indicates that the P-value is less than 0.001, and unlabeled positions indicate that the correlation is not significant.

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4. Comparison of molecular subtypes, candidate driver genes, and structural variations with previous studies.

    (A) Decision trees of molecular subtypes of the 185 gastric cancer patients and TCGA-STAD samples. (B) Comparison of candidate driver genes detected using three reference genomes in the 185 samples and those from two different gastric cancer cohorts (TCGA-STAD and Stomach-AdenoCA). The blue color represents the mutation rates of genes in each cohort. The gray color represents unknown mutation rate information for the gene. A circle indicates that the gene was determined to be a driver gene in this cohort. (C) Comparison of structural variants detected using GGCPan and MC, a graph pangenome constructed with healthy samples. “+” stands for presence and “−” for absence. (D) There is no overlap between the 24 candidate driver genes and the genes found to be significantly associated with the phenotype by GCPan PAV analysis.

  • Figure S7.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S7. Differences in somatic copy-number variants of the four subtypes, with red color representing copy-number amplification and blue color representing copy-number deletion.

    The color bar on the left represents the division of the 185 samples into four subtypes. The heatmap represents the copy-number variation for each sample. The red color represents copy-number amplification, and the blue color represents copy-number deletion.

Tables

  • Figures
  • Supplementary Materials
    • View popup
    Table 1.

    Time required to analyze each sample using three different reference genomes (clock hours).

    ReferenceGRCh38GCPanGGCPan
    Construct—<53
    Alignment77.52
    GATK preprocess121312
    SNP and indel detect676
    Structural variant detect22.53
    • “—”: not needed. Only the time and memory requirement after genome assembly was constructed.

    • View popup
    Table 2.

    Memory for each sample using three reference genomes (GB).

    ReferenceGRCh38GCPanGGCPan
    Construct—<200400
    Alignment404060
    GATK preprocess404012
    SNP and indel detect444
    Structural variant detect2828320
    • “—”: not needed. Only the time and memory requirement after genome assembly was constructed.

Supplementary Materials

  • Figures
  • Tables
  • Supplemental Data 1.

    Evaluation of the impact of different aspects on structural variant detection, including different reference genomes, structural variation identification methods, and the whole-genome sequencing depths.[LSA-2024-02977_Supplemental_Data_1.docx]

  • Table S1. Genotyping evaluation on the Genome in a Bottle dataset from public literature (Hickey et al, 2020).

  • Table S2. Tumor mutation burdens of 185 samples using three genomes.

  • Table S3. Calculation of kappa values using different references.

  • Table S4. Primers of the candidate MSI-H markers.

  • Supplemental Data 2.

    Manual check of the four driver genes detected only by GGCPan.[LSA-2024-02977_Supplemental_Data_2.docx]

  • Table S5. MutSigCV output reports of the four driver genes.

  • Table S6. Information of the 24 candidate driver genes.

  • Table S7. Mutation rate of significant mutated genes.

PreviousNext
Back to top
Download PDF
Email Article

Thank you for your interest in spreading the word on Life Science Alliance.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Gastric cancer genomics study using reference human pangenomes
(Your Name) has sent you a message from Life Science Alliance
(Your Name) thought you would like to see the Life Science Alliance web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Pangenome-based disease genomics
Du Jiao, Xiaorui Dong, Shiyu Fan, Xinyi Liu, Yingyan Yu, Chaochun Wei
Life Science Alliance Jan 2025, 8 (4) e202402977; DOI: 10.26508/lsa.202402977

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Pangenome-based disease genomics
Du Jiao, Xiaorui Dong, Shiyu Fan, Xinyi Liu, Yingyan Yu, Chaochun Wei
Life Science Alliance Jan 2025, 8 (4) e202402977; DOI: 10.26508/lsa.202402977
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
Issue Cover

In this Issue

Volume 8, No. 4
April 2025
  • Table of Contents
  • Cover (PDF)
  • About the Cover
  • Masthead (PDF)
Advertisement

Jump to section

  • Article
    • Abstract
    • Introduction
    • Results
    • Discussion
    • Materials and Methods
    • Data and Code Availability
    • Acknowledgements
    • Footnotes
    • References
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF

Subjects

  • Genomics & Functional Genomics

Related Articles

  • No related articles found.

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • RNA profile by single cell analysis of severe dengue in mice
  • Benchmarking SV prioritization tools
Show more Resource

Similar Articles

EMBO Press LogoRockefeller University Press LogoCold Spring Harbor Logo

Content

  • Home
  • Newest Articles
  • Current Issue
  • Archive
  • Subject Collections

For Authors

  • Submit a Manuscript
  • Author Guidelines
  • License, copyright, Fee

Other Services

  • Alerts
  • Bluesky
  • X/Twitter
  • RSS Feeds

More Information

  • Editors & Staff
  • Reviewer Guidelines
  • Feedback
  • Licensing and Reuse
  • Privacy Policy

ISSN: 2575-1077
© 2025 Life Science Alliance LLC

Life Science Alliance is registered as a trademark in the U.S. Patent and Trade Mark Office and in the European Union Intellectual Property Office.