Skip to main content
Advertisement

Main menu

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Archive
    • Subjects
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why Submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research

User menu

  • My alerts

Search

  • Advanced search
Life Science Alliance
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research
  • My alerts
Life Science Alliance

Advanced Search

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Archive
    • Subjects
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why Submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Follow lsa Template on Twitter
Methods
Transparent Process
Open Access

Targeted variant detection using unaligned RNA-Seq reads

View ORCID ProfileEric Olivier Audemard, View ORCID ProfilePatrick Gendron, View ORCID ProfileAlbert Feghaly, Vincent-Philippe Lavallée, Josée Hébert, Guy Sauvageau, View ORCID ProfileSébastien Lemieux  Correspondence email
Eric Olivier Audemard
1The Leucegene Project at Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eric Olivier Audemard
Patrick Gendron
1The Leucegene Project at Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Patrick Gendron
Albert Feghaly
1The Leucegene Project at Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Albert Feghaly
Vincent-Philippe Lavallée
1The Leucegene Project at Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
2Division of Hematology, Maisonneuve-Rosemont Hospital, Montréal, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Josée Hébert
1The Leucegene Project at Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
2Division of Hematology, Maisonneuve-Rosemont Hospital, Montréal, Canada
4Quebec Leukemia Cell Bank, Maisonneuve-Rosemont Hospital, Montréal, Canada
5Department of Medicine, Faculty of Medicine, Université de Montréal, Montréal, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Guy Sauvageau
1The Leucegene Project at Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
2Division of Hematology, Maisonneuve-Rosemont Hospital, Montréal, Canada
4Quebec Leukemia Cell Bank, Maisonneuve-Rosemont Hospital, Montréal, Canada
5Department of Medicine, Faculty of Medicine, Université de Montréal, Montréal, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sébastien Lemieux
1The Leucegene Project at Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
3Department of Biochemistry, Université de Montréal, Montréal, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sébastien Lemieux
  • For correspondence: s.lemieux@umontreal.ca
Published 19 August 2019. DOI: 10.26508/lsa.201900336
  • Article
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Supplementary Materials
  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1. Overview of km.

    (A) Visual description of how km detects the DNMT3A R882 SNP, using k-mers of 31 bp. The input sequence (target sequence given by user) is centered on DNMT3A’s 882nd codon. This sequence is segmented in k-mers to create a linear graph, which represents the search space delimited by the starting and ending k-mers (hatched). (B) A variant will be represented by a new path between the two extremities. This path is found by walking along the linear directed graph and following new (i.e., not seen in the target sequence) overlapping k-mers, queried from a sample’s count table. (C) Schematic representation of an SNP k-mer graph with each path representing the target (lower) and the variant (upper) sequences. (D) Same representation for an ITD variant. (E) Same representation showing the use of a variant target sequence to initiate km’s search. Here, the expected variant is detected when all k-mers that overlap the starting and ending k-mers (in red) are found in the sample’s count table. Also, additional variants could be detected if another path is found (in orange).

  • Figure S1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S1. Comparison of k-mer count tables and BAM files size.

    K-mer count tables are created by Jellyfish, with k = 31 bp and minimum k-mer count of 2. BAM files produced following a STAR alignment.

  • Figure S2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S2. Percentage of transcriptome sequences from Ensembl annotation (GRCh38.82) that can be represented by a linear directed graph.

    At k = 31 bp, 96.76% of the transcriptome can be used as a target sequence. The transcript requiring the largest k to achieve a linear representation is ENST00000621744_NBPF19 and would require k = 3,472 bp.

Tables

  • Figures
  • Supplementary Materials
    • View popup
    Table 1.

    Current catalog of targeted mutations for AML.

    GeneExpected type of variantGene locationTarget sequence length (bp)Average running timea (s/sample)
    Reference target
     IDH1SNP (R132)Exon 4650.008
     DNMT3ASNP (R882)Exon 23650.01
     NPM14-bp insertionExon 10–11 + UTR800.017
     FLT3ITDExon 13–153450.107
     FLT3TKDExon 20680.019
     MYCSNP (T58A/P59R)Exon 2680.018
    Variant target
     NUP98–NSD1FusionExon 11 + 7620.07
     NSD1–NUP98FusionExon 6 + 13620.061
     KMT2APTDExon 8 + 2620.067
    • ↵a Average computation times are reported for Leucegene samples and assume that k-mer count tables are cached in RAM before running km. The performance of the caching step is highly dependent on I/O architecture, taking around 25 s on a typical system. The approaches used to prepare each target sequence for detecting the expected mutations are presented in the Materials and Methods section.

    • View popup
    Table 2.

    Variants identified by km using our AML catalog in the Leucegene and TCGA cohort.

    DatasetMutation nameKm typeVariantTargetNumber of samples
    InsDelIndelSubITDI&I
    LeucegeneIDH1 R132000320032437437
    DNMT3A R882000640064436
    NPM1 4-bp ins2201010313139437
    FLT3–ITD1033388354162429
    FLT3–TKD040310034434
    MYC T58A/P59R0002002437
    NUP98–NSD17a0000076
    NSD1–NUP9800000002
    KMT2A–PTD10b000001015
    TCGA (AML)IDH1 R132000110011148151
    DNMT3A R882000120012149
    NPM1 4-bp ins600028640151
    FLT3–ITD7605222018100142
    FLT3–TKD010110012149
    MYC T58A/P59R0002002139
    NUP98–NSD100000000
    NSD1–NUP9800000002
    KMT2A–PTD3b0000030
    TCGA (non-AML)IDH1 R132000394003949,85010,256
    DNMT3A R88200000009,267
    NPM1 4-bp ins00000001,0232
    FLT3–ITD0009009163
    FLT3–TKD0000000361
    MYC T58A/P59R00050059,204
    NUP98–NSD100000000
    NSD1–NUP9800000000
    KMT2A–PTD00000000
    • ↵a Fusion with exon 12 found as an insertion in the target sequence.

    • ↵b Tandem duplication extended with exon 9 or 9 and 10.

    • Each dataset is split into two parts: reference target and variant target (italic). The “target” column reports the number of samples expressing the target sequence. The “variant” column shows the number of samples where at least one variant of the target sequence is found. As a variant target sequence represents a mutated sequence (Fig 1E), mutated samples counts are indicated in bold. The columns in “km type” identify the specific types of variants detected. Of note, several types of variants can be identified in a given sample. As expected, SNVs on IDH1 are found in AML and non-AML samples on lower grade glioma (LGG) (see Table S1).

    • View popup
    Table 3.

    NPM1 mutations identified by km.

    TypeCOSMIC IDKm variantKm typeLeucegneTCGA AML
    ACOSM17559TCTGITD10328
    DCOSM17573CCTGI&I114
    BCOSM17571CATGInsertion104
    COSM20809CCAGInsertion30
    COSM29814CAGAInsertion20
    COSM3356078CAAGInsertion10
    COSM29814CCGAInsertion10
    COSM3356078CGCGInsertion11
    COSM28066CGGAInsertion10
    COSM20811CTTGInsertion10
    COSM20815TATGI&I12
    COSM20813TCGGI&I10
    COSM27390TTCGInsertion10
    COSM20850AGAAInsertion10
    COSM20810TTGTInsertion01
    Unknowngc/CAGGGIndel10
    Total mutated/total138/43740/151

Supplementary Materials

  • Figures
  • Tables
  • Table S1 Details of TCGA mutated samples on IDH1.

  • Table S2 Contingency tables on NPM1 insertion and FLT3–ITD.

  • Table S3 Summary of all variants found by km, ITDassembler, Pindel, and Genomon ITDetector on 28 TCGA AML samples for which exome and RNA sequencing were available.

PreviousNext
Back to top
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Life Science Alliance.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Targeted variant detection using unaligned RNA-Seq reads
(Your Name) has sent you a message from Life Science Alliance
(Your Name) thought you would like to see the Life Science Alliance web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Targeted variant detection for RNA-Seq
Eric Olivier Audemard, Patrick Gendron, Albert Feghaly, Vincent-Philippe Lavallée, Josée Hébert, Guy Sauvageau, Sébastien Lemieux
Life Science Alliance Aug 2019, 2 (4) e201900336; DOI: 10.26508/lsa.201900336

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Targeted variant detection for RNA-Seq
Eric Olivier Audemard, Patrick Gendron, Albert Feghaly, Vincent-Philippe Lavallée, Josée Hébert, Guy Sauvageau, Sébastien Lemieux
Life Science Alliance Aug 2019, 2 (4) e201900336; DOI: 10.26508/lsa.201900336
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
Issue Cover

In this Issue

Volume 2, No. 4
August 2019
  • Table of Contents
  • Cover (PDF)
  • About the Cover
  • Masthead (PDF)
Advertisement

Jump to section

  • Article
    • Abstract
    • Introduction
    • Results
    • Discussion
    • Materials and Methods
    • Acknowledgements
    • References
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF

Subjects

  • Systems & Computational Biology
  • Methods & Resources
  • Genomics & Functional Genomics

EMBO Press LogoRockefeller University Press LogoCold Spring Harbor Logo

Content

  • Home
  • Newest Articles
  • Current Issue
  • Archive
  • Subject Collections

For Authors

  • Submit a Manuscript
  • Author Guidelines
  • License, copyright, Fee

Other Services

  • Alerts
  • Twitter
  • RSS Feeds

More Information

  • Editors & Staff
  • Reviewer Guidelines
  • Feedback
  • Licensing and Reuse
  • Privacy Policy

ISSN: 2575-1077
© 2021 Life Science Alliance LLC

Life Science Alliance is registered as a trademark in the U.S. Patent and Trade Mark Office and in the European Union Intellectual Property Office.