Skip to main content
Advertisement

Main menu

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research

User menu

  • My alerts

Search

  • Advanced search
Life Science Alliance
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research
  • My alerts
Life Science Alliance

Advanced Search

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Follow lsa Template on Twitter
Research Article
Transparent Process
Open Access

On the relation of gene essentiality to intron structure: a computational and deep learning approach

View ORCID ProfileEthan Schonfeld  Correspondence email, Edward Vendrow, Joshua Vendrow, Elan Schonfeld
Ethan Schonfeld
1Stanford University, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ethan Schonfeld
  • For correspondence: eschon22@stanford.edu
Edward Vendrow
1Stanford University, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joshua Vendrow
2University of California, Los Angeles, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elan Schonfeld
3Glenbrook North High School, Northbrook, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Published 27 April 2021. DOI: 10.26508/lsa.202000951
  • Article
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF
Loading

Article Figures & Data

Figures

  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1. Details of convolutional neural network and testing results.

    (A) Our model uses a convolutional architecture to predict intron essentialities. The convolutional layer contains multiple filters that detect motifs within the intronic sequence. Then, the pooling layer averages each filter’s response across the sequence to determine the cumulative presence of motifs. The resulting values are fed into a fully connected layer followed by a two-value softmax output layer corresponding to the probabilities of the intron being part of an essential or nonessential gene. The best performing model from our hyperparameter search used 128 convolutional filters with a window size of 24 and a fully connected layer with 128 neurons. We found best results when training with an L2 regularization parameter of 10–6 and a dropout rate of 0.2. We trained two models, one on the first 1,000 bp of introns and one on the last 1,000 bp. This includes the 5′ splice site in the first 1,000 bp, as well as the 3′ splice site and the branch site in the last 1,000 bp. In all following results, these models are tested on their respective sections of the intronic sequence. (B) Our model, trained on the first 1,000 bp of introns, had an AUC of 0.734. Our model, trained on the last 1,000 bp of introns, had an AUC of 0.725. We predicted gene essentiality using a majority classifier on all introns of a gene. The majority classifier of the model trained on the first 1,000 bp of introns saw an AUC of 0.825, and the majority classifier of the model trained on the last 1,000 bp of introns saw an AUC of 0.823. We further improved accuracy by averaging the outputs of both majority classifiers. This combined classification strategy achieved an AUC of 0.846. (C) As the first intron is known to have unique properties, we separately tested the models on only first introns, seeing improved accuracy. On first introns, the model trained on the first 1,000 bp of introns had an AUC of 0.745 and the model trained on the last 1,000 bp of introns had an AUC of 0.763. We further improved first intron essentiality prediction by averaging the outputs of both models to make a dual average prediction, achieving an AUC of 0.793. These results suggest unique properties characterize first introns in essential versus nonessential genes.

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2. Introns of essential genes differ from introns of nonessential genes by size, number, and position.

    (A) The dashed-green line represents the mean and the notches are calculated using a Gaussian-based asymptotic approximation to represent confidence intervals around the medians (orange lines). The first introns for essential (P = 0.0001), conditional (P < 0.00001), and nonessential (P < 0.00001) genes are larger than the gene’s later introns; however, essential gene first introns are longer than the later introns to a lesser degree than those of nonessential introns. The nonessential first intron is much longer (mean 3.3 times greater) than the essential first intron (P < 0.00001). For later introns, nonessential are longer than essential (P < 0.00001), but these lengths are closer than the disparity between first intron sizes. Conditional introns typically fall within the middle. (B) Essential genes have a greater number of introns than both conditional (P = 0.021) and nonessential (P < 0.00001) genes. (C) However, essential genes have a lesser total length of intronic sequence than both conditional (P < 0.00001) and nonessential (P < 0.00001) genes.

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3. Introns of essential genes differ from introns of nonessential genes by GC density and lower frequency of unusual 5′/3′ splice sites.

    (A) The first introns of essential (P < 0.00001), conditional (P < 0.00001), and nonessential (P < 0.00001) genes have a higher GC density than the later introns. Essential (P = 0.0004) and conditional (P < 0.00001) genes have a higher density of GC regions in their first introns than nonessential first introns. The proportion of GC density of the first intron to later introns for nonessential genes is 1.13, for conditional genes is 1.22, and for essential genes is 1.35. GC density is greater in first introns of essential genes. (B) GC content, with GC motif content subtracted, has similar distribution to GC motif density among introns split by first/later and essential/conditional/nonessential. GC content is particularly important in annealing strength and increasing gene stability. (C) Essential gene introns less frequently have an unusual 5′ splice site than conditional introns which in turn less frequently have an unusual 5′ splice site than nonessential introns. The first intron of essential genes is less likely to have an unusual 5′ splice site than conditional or nonessential first introns. In addition, essential first introns are less likely to have an unusual 5′ splice site than essential later introns. A conditional first intron is less likely to have an unusual 5′ splice site than nonessential first introns, so we see that this effect correlates with essentiality. The first intron of nonessential genes is most likely to have an unusual 5′ splice site. (D) The first intron of essential genes is less likely to have an unusual 3′ splice site than conditional genes which in turn are less likely to have an unusual 3′ splice site than first introns of nonessential genes. We see that this effect again correlates with essentiality.

PreviousNext
Back to top
Download PDF
Email Article

Thank you for your interest in spreading the word on Life Science Alliance.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
On the relation of gene essentiality to intron structure: a computational and deep learning approach
(Your Name) has sent you a message from Life Science Alliance
(Your Name) thought you would like to see the Life Science Alliance web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Intron structures of essential genes
Ethan Schonfeld, Edward Vendrow, Joshua Vendrow, Elan Schonfeld
Life Science Alliance Apr 2021, 4 (6) e202000951; DOI: 10.26508/lsa.202000951

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Intron structures of essential genes
Ethan Schonfeld, Edward Vendrow, Joshua Vendrow, Elan Schonfeld
Life Science Alliance Apr 2021, 4 (6) e202000951; DOI: 10.26508/lsa.202000951
Reddit logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
Issue Cover

In this Issue

Volume 4, No. 6
June 2021
  • Table of Contents
  • Cover (PDF)
  • About the Cover
  • Masthead (PDF)
Advertisement

Jump to section

  • Article
    • Abstract
    • Introduction
    • Results
    • Discussion
    • Materials and Methods
    • Data Availability
    • Acknowledgments
    • References
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF

Subjects

  • Genetics, Gene Therapy & Genetic Disease
  • Systems & Computational Biology

Related Articles

  • No related articles found.

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • Preeclampsia leads to autism in offspring
  • ApoE intersects with Aβ in neurons
  • Biallelic FGF12 aberrations lead to epilepsy
Show more Research Article

Similar Articles

EMBO Press LogoRockefeller University Press LogoCold Spring Harbor Logo

Content

  • Home
  • Newest Articles
  • Current Issue
  • Archive
  • Subject Collections

For Authors

  • Submit a Manuscript
  • Author Guidelines
  • License, copyright, Fee

Other Services

  • Alerts
  • Twitter
  • RSS Feeds

More Information

  • Editors & Staff
  • Reviewer Guidelines
  • Feedback
  • Licensing and Reuse
  • Privacy Policy

ISSN: 2575-1077
© 2023 Life Science Alliance LLC

Life Science Alliance is registered as a trademark in the U.S. Patent and Trade Mark Office and in the European Union Intellectual Property Office.