Skip to main content
Advertisement

Main menu

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Author Interviews
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Publish with LSA
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Journal of Human Immunity
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research

User menu

  • My alerts

Search

  • Advanced search
Life Science Alliance
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Journal of Human Immunity
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research
  • My alerts
Life Science Alliance

Advanced Search

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Author Interviews
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Publish with LSA
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Follow LSA on Bluesky
  • Follow lsa Template on Twitter
Resource
Transparent Process
Open Access

Systematic assessment of structural variant annotation tools for genomic interpretation

View ORCID ProfileXuanshi Liu, Lei Gu, Chanjuan Hao, Wenjian Xu, Fei Leng, Peng Zhang, View ORCID ProfileWei Li  Correspondence email
Xuanshi Liu
1Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children’s Health; Beijing Children’s Hospital, Capital Medical University, Beijing, China
Roles: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Visualization, Writing—original draft, Writing—review and editing
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xuanshi Liu
Lei Gu
2Epigenetics Laboratory, Max-Planck Institute for Heart and Lung Research, Cardiopulmonary Institute, Bad Nauheim, Germany
Roles: Visualization, Methodology, Writing—review and editing
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chanjuan Hao
1Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children’s Health; Beijing Children’s Hospital, Capital Medical University, Beijing, China
Roles: Supervision, Project administration, Interpretation of the data
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wenjian Xu
1Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children’s Health; Beijing Children’s Hospital, Capital Medical University, Beijing, China
Roles: Methodology, Interpretation of the data
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fei Leng
1Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children’s Health; Beijing Children’s Hospital, Capital Medical University, Beijing, China
Roles: Interpretation of the data
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peng Zhang
1Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children’s Health; Beijing Children’s Hospital, Capital Medical University, Beijing, China
Roles: Validation, Visualization, Interpretation of the data
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wei Li
1Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children’s Health; Beijing Children’s Hospital, Capital Medical University, Beijing, China
Roles: Supervision, Funding acquisition, Project administration, Writing—review and editing
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wei Li
  • For correspondence: liwei{at}bch.com.cn
Published 10 December 2024. DOI: 10.26508/lsa.202402949
  • Article
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Supplementary Materials
  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1. Overview of study workflow for SV prioritization benchmarking.

    This workflow illustrates the evaluation process for eight SV prioritization tools, categorized into knowledge-driven and data-driven approaches. These tools were benchmarked across seven independent and curated datasets using three main criteria: (1) accuracy in pathogenicity prediction, (2) robustness in diverse genomic and biological contexts, and (3) usability, focusing on user accessibility and computational performance.

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2. Comparative performance of eight SV prioritization approaches.

    (A) Correlation analysis between positive (pathogenic) and negative (benign) variant sets across the eight approaches, indicating the differentiation ability of each tool. (B) Distribution of pathogenicity scores for positive and negative sets, showing score separation across the tools. (C) Performance summary across all germline variants from ClinVar, measured by area under the curve.

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3. Performance of SV prioritization tools across genomic contexts.

    SV type, length, and gene content: (A) Distribution of pathogenicity scores for deletion and duplication sets, illustrating score separation across tools by SV type. (B) Performance of each tool in deletions and duplications among germline ClinVar variants, evaluated by area under the curve (AUC). (C) Length distributions of deletions and duplications within the dataset. (D) AUCs performance over three lengths ranges (L1< 6*103, L2:6*103∼105, L3 >105) for deletions and duplications. (E) Distribution differences in protein-coding gene coverage between negative (benign) and positive (pathogenic) SV sets. (F) AUC comparison by gene context (disease-related, functional genes) for deletions and duplications, further categorized by SVs covering zero genes (No. genes = 0) and one or more genes (No. genes ≥ 1). AUC, area under the curve; SV, structural variant.

  • Figure S1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S1. Distribution and intersection analyses of disease-relevant and functionally relevant genes.

    (A) Upset plot showing the distribution and overlap of disease-relevant gene sets across three databases: ACMG (American College of Medical Genetics), ClinGen (Clinical Genome Resource), and Orphanet. Vertical bars indicate the size of each intersection, while horizontal bars represent the total number of genes in each database. The largest group is unique to ClinGen (3,686 genes), with additional smaller overlaps across combinations of the three datasets. (B) Upset plot of functionally relevant genes across four experimental and phenotypic databases: cell culture, pTriplo (probability of triplosensitivity), MGI (Mouse Genome Informatics), and pHaplo (probability of haploinsufficiency). The highest number of unique genes is in cell culture (1,351 genes), with other notable intersections among database combinations. Vertical bars show intersection sizes, while horizontal bars display total gene counts in each dataset.

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4. Performance across approaches covering various biological mechanisms including noncoding SVs, long range SVs, somatic SVs, GWAS SV and eQTL SV.

    AUC, area under the curve; SV, structural variant.

Tables

  • Figures
  • Supplementary Materials
    • View popup
    Table 1.

    Overview of the approaches evaluated in this work.

    SoftwareaYearMain languageAssumptionClassifierTraining setResultURL
    AnnotSV (Version: 3.3.6)2018Tcl, Shell, PythonACMGACMGImplementation of ACMG guideline.Annotation, scoreshttps://lbgi.fr/AnnotSV/
    CADD-SV (Version 1.1)2022Python, REvolutionary fitnessRandom forestRandomly distributed SVs over the human autosomes, evolutionarily fixed chimpanzee and human-derived SVs.Scoreshttps://cadd-sv.bihealth.org/
    ClassifyCNV (Version 1.1.1)2020Python, ShellACMGACMGImplementation of ACMG guideline.Scoreshttps://github.com/Genotek/ClassifyCNV
    dbCNV2023Perl, ShellMolecular functionsGradient boosted treesThe ClinVar, dbVar, ClinGen, DGV, DECIPHER and gnomAD (accessed before January 2023)Classificationhttps://github.com/lllllv-1/dbCNV
    StrVCTVRE (Version 1.7)2022PythonMolecular functions on exonsRandom forestRare SVs from ClinVar, gnomAD, and a recent great ape sequencing study.Scoreshttps://strvctvre.berkeley.edu/
    SVScore (Version 0.6)2017Perl, ShellSNPs-based CADD scoresDerived from CADDbThe precomputed SNP scores generated by CADD v1.3Scoreshttps://github.com/lganel/SVScore
    TADA (Version 1.0.2)2022Python, ShellMolecular functions related to long range interactionRandom forestDECIPHER, Variants in the set published by Audano et al (2019), GnomAD, UK Biobank data set and DGV.Scoreshttps://github.com/jakob-he/TADA/
    XCNV2022R,ShellMolecular functionsXGBoostThe dbVar, ClinGen, DECIPHER v10.1, and DGV (accessed before January 2021).Scoreshttps://github.com/kbvstmd/XCNV
    • ↵a Software version was given if available.

    • ↵b CADD was generated by the support vector machine.

    • View popup
    Table 2.

    Summary of seven independent datasets used in this study.

    Benchmark datasetPositive set (number of positive variants)Negative set (number of negative variants)
    Germline SVsa from ClinVar and GnomAD“pathogenic” and “likely pathogenic” germline SVs from ClinVar (January. 2023–April. 2024) (N = 489).(1) ”benign” and “likely benign” germline SVs from ClinVar (January. 2023–April. 2024) (N = 93); (2) randomly select rare SVs with matched lengths with positive sets from gnomAD v4 (N = 396).
    Noncoding SVs and GnomADNoncoding SVs from peer-reviewed publications (N = 6).Randomly select rare SVs with matched lengths with positive sets from gnomAD v4; No overlapped with protein coding genes listed at gencode v30lift37 (N = 6).
    Long range SVs and GnomADSVs implicated in long-range interactions from peer-reviewed publications (N = 12).Randomly select rare SVs with matched lengths with positive sets from gnomAD v4 (N = 12).
    Somatic SVsSomatic SVs from COSMIC (v99) with recurrence >=2 and located on risk genes listed at oncoKB (N = 218).Randomly select somatic SVs from COSMIC (v99) with recurrence = 1 and no overlapped with risk genes listed at oncoKB (N = 238).
    Disease associated SVs from a GWAS and GnomADRare SVs which validated by replication listed at the peer-reviewed publication (N = 32).Randomly select rare SVs with matched lengths with positive sets from gnomAD v4 (N = 32).
    Functional relevant SVs from eQTL studies and GnomADRare SVs: aberrant gene expression is in multi tissues and the gene has dosage changed (N = 72).Randomly select rare SVs with matched lengths with positive sets from gnomAD v4 (N = 72).
    • ↵a SVs including CNVs and deletions, duplications.

    • View popup
    Table 3.

    Summary of computational efficiency and user-friendliness over all approaches.

    SoftwareKnowledge drivenData driven
    AnnotSVClassifyCNVCADD-SVdbCNVStrVCTVRESVScoreTADAXCNV
    Efficiency (second)123100 (20 cores)381951624
    Document qualityExcellentNormalGoodNormalGoodNormalGoodGood
    Installationcmd, conda, dockercmdcondacmdcmd, condacmdcmdcmd
    Prerequisite datasetYesYesYesNoYesYesYesYes
    Genome buildhg19, hg38hg19, hg38hg38hg19hg19, hg38hg19hg19hg19
    InputBed, vcfBedBedBedBedvcfBedBed
    SV typesDeletion, insertion, duplication, inversion, breakend recordCNVDeletion, insertion and duplicationCNVDeletion, duplicationAll typesCNVCNV
    Online webserverYesNoYesNoYesNoNoYes

Supplementary Materials

  • Figures
  • Tables
  • Table S1. SV prioritization methods to be evaluated (Erikson et al, 2015; McLaren et al, 2016; Geoffroy et al, 2018; Huynh & Hormozdiari, 2019; Spector & Wiita, 2019; Kumar et al, 2020; Nieboer & de Ridder, 2020; Bhattacharya et al, 2021; Fan et al, 2021; Fino et al, 2021; Minoche et al, 2021; Requena et al, 2021; Yang et al, 2022; Danis et al, 2022; Ding et al, 2023; Macnee et al, 2023).

  • Table S2. Detailed description of seven independent data sources used in this study.

  • Table S3. Performances over all approaches.

  • Table S4. Performance across approaches in two SV types.

  • Table S5. Performance across approaches in different length groups.

  • Table S6. Performance across approaches in different groups of genes.

  • Table S7. Curated datasets from publications for biological mechanisms.

PreviousNext
Back to top
Download PDF
Email Article

Thank you for your interest in spreading the word on Life Science Alliance.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Systematic assessment of structural variant annotation tools for genomic interpretation
(Your Name) has sent you a message from Life Science Alliance
(Your Name) thought you would like to see the Life Science Alliance web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Benchmarking SV prioritization tools
Xuanshi Liu, Lei Gu, Chanjuan Hao, Wenjian Xu, Fei Leng, Peng Zhang, Wei Li
Life Science Alliance Dec 2024, 8 (3) e202402949; DOI: 10.26508/lsa.202402949

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Benchmarking SV prioritization tools
Xuanshi Liu, Lei Gu, Chanjuan Hao, Wenjian Xu, Fei Leng, Peng Zhang, Wei Li
Life Science Alliance Dec 2024, 8 (3) e202402949; DOI: 10.26508/lsa.202402949
Twitter logo Facebook logo Mendeley logo Bluesky logo
  • Tweet Widget
  • Bluesky logo Bluesky
Issue Cover

In this Issue

Volume 8, No. 3
March 2025
  • Table of Contents
  • Cover (PDF)
  • About the Cover
  • Masthead (PDF)
Advertisement

Jump to section

  • Article
    • Abstract
    • Introduction
    • Results
    • Discussion
    • Materials and Methods
    • Data Availability
    • Acknowledgements
    • References
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF

Subjects

  • Genomics & Functional Genomics
  • Methods & Resources
  • Systems & Computational Biology

Related Articles

  • No related articles found.

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • Predicting NMD from sepsis RNA-Seq data
  • Comparison of mitochondrial imaging in aging C. elegans
  • RNA profile by single cell analysis of severe dengue in mice
Show more Resource

Similar Articles

EMBO Press LogoRockefeller University Press LogoCold Spring Harbor Logo

Content

  • Home
  • Newest Articles
  • Current Issue
  • Archive
  • Subject Collections

For Authors

  • Submit a Manuscript
  • Author Guidelines
  • License, copyright, Fee

Other Services

  • Alerts
  • Bluesky
  • X/Twitter
  • RSS Feeds

More Information

  • Editors & Staff
  • Reviewer Guidelines
  • Feedback
  • Licensing and Reuse
  • Privacy Policy

ISSN: 2575-1077
© 2025 Life Science Alliance LLC

Life Science Alliance is registered as a trademark in the U.S. Patent and Trade Mark Office and in the European Union Intellectual Property Office.