Skip to main content
Advertisement

Main menu

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research

User menu

  • My alerts

Search

  • Advanced search
Life Science Alliance
  • Other Publications
    • EMBO Press
    • The EMBO Journal
    • EMBO reports
    • EMBO Molecular Medicine
    • Molecular Systems Biology
    • Rockefeller University Press
    • Journal of Cell Biology
    • Journal of Experimental Medicine
    • Journal of General Physiology
    • Cold Spring Harbor Laboratory Press
    • Genes & Development
    • Genome Research
  • My alerts
Life Science Alliance

Advanced Search

  • Home
  • Articles
    • Newest Articles
    • Current Issue
    • Methods & Resources
    • Archive
    • Subjects
  • Collections
  • Submit
    • Submit a Manuscript
    • Author Guidelines
    • License, Copyright, Fee
    • FAQ
    • Why submit
  • About
    • About Us
    • Editors & Staff
    • Board Members
    • Licensing and Reuse
    • Reviewer Guidelines
    • Privacy Policy
    • Advertise
    • Contact Us
    • LSA LLC
  • Alerts
  • Follow lsa Template on Twitter
Research Article
Transparent Process
Open Access

A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS

View ORCID ProfileRobert A Player, Ellen R Forsyth, Kathleen J Verratti, David W Mohr, View ORCID ProfileAlan F Scott, View ORCID ProfileChristopher E Bradburne
Robert A Player
1Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Robert A Player
Ellen R Forsyth
1Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kathleen J Verratti
1Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David W Mohr
2McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alan F Scott
2McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alan F Scott
Christopher E Bradburne
1Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
2McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christopher E Bradburne
Published 29 January 2021. DOI: 10.26508/lsa.202000902
  • Article
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Supplementary Materials
  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1. Diagram of wet laboratory workflow.

    Sample collection, extraction, and sequencing library preparation methods used in this study are shown.

  • Figure S1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S1. Pulsed Field Gel Electrophoresis (PFGE) visualization of high molecular weight-DNA extracted from blood.

    The four methods used were phenol chloroform extraction, PAXgene, MagMax, and Nanobind, using blood stored in two different preservation agents (PAXgene and EDTA). Samples are from four dogs (numbered across the top 1–4). The lambda ladder is in the middle lane of each gel, indicated by a yellow “L” (48.5 Kb–1 Mb, 18 bands at 48.5 Kb steps). The PAXgene extraction kit is the only kit that failed to yield high molecular weight DNA. PFGE run at 70 V for 20 h. PFGE was performed on the Blue Pippin Pulse (Sage Science, PPI- 0200), using 70 V for 20 h at 4°C. The program parameters started at 70 V with an initial cycle of 300 msec forward pulse followed by a 100 msec reverse pulse. At each step 30 msec was added to the forward pulse and 10 msec was added to the reverse pulse for a total of 45 steps per cycle, with the cycles repeating over the 20 h time frame.

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2. Total extracted DNA and DNA quality from four tested isolation kits.

    (A) Total extracted DNA. (B) DNA quality; green line indicates the ideal 260/280 ratio for DNA purity at 1.80. Extractions from the Nanobind kit had the most consistently high yield and quality.

  • Figure S2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S2. Visualization of Oxford Nanopore Technologies sequencing data.

    Sequence length and base call quality distributions of combined read data from all eight Oxford Nanopore Technologies flow cells used to generate at least 20× depth across the ∼2.3 Gb canine genome. Additional details may be found in Tables 4 and 5. Visualizations were produced using the program lrplot (https://github.com/ahcm/longread_plots).

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3. Genome assembly contig count versus total length of assembly.

    Each point represents a distinct assembly resulting from one of 144 unique miniasm parameter combinations. Sequence data from eight ONT flow cells are represented in the plot, four from each of the Ligation and Rapid library preparation kits (SQK-LSK109 and SQK-RAD004, respectively). See Table 3 for details linking “estimated depth” to sequencing run and library kit. The estimated depth of 22.65 is a combination of reads from all eight flow cells (black boxed region in upper left, see Fig 4 for details regarding parameters). Estimated coverage is based on the total bps in the read set divided by the total length of CanFam3.1 assembly including Ns. Total bps of assembly approaches estimated total genome size as depth approaches 20×. Horizontal dashed red line—size of CanFam3.1 with N’s (2,327,604,993 bp); vertical dashed red line—contig count (19,555) of CanFam3.1 chromosomal scaffolds broken at every occurrence of N. The following “Estimated Depth(s)” are from: the rapid kit only (5.99, 6.65, and 12.64); the ligation kit only (3.96, 6.66, 10.02); and a combination of the two (10.60, 12.05, and 22.65).

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4. Genome assembly contig count versus total length of assembly.

    The estimated genome depth of the data is 22.65×. Contig count calculated from counting number of headers in resulting assembly FASTA files, and total length calculated from non-header character count. (A, B, C, D) Zoomed in view of the top-left group of assemblies from Fig 3, colored by parameter value and broken down by miniasm parameter type: (A) i, ignore mappings with identity less than INT identity; (B) s, drop mapping less than INT total bps; (C) I, minimap overlap ratio; and (D) e, contig is removed if it is generated from less than INT reads. Note that miniasm parameter “m” (for dropping read mappings with less than INT matching bps) is left out, as all points for the three values used (25, 50, and 100) are all overlapping (i.e., “m” has no effect on contig count or total bps). Default parameters for miniasm are: m = 100, i = 0.05, s = 1,000, I = 0.8, e = 4. The blue diamond indicates the down-selected assembly (v0.0 in Table 4) used for polishing and final scaffolding, miniasm parameters used: m = 100, i = 0.05, s = 500, I = 0.8, e = 3. The red dashed line indicates the genome size (with N’s) of CanFam3.1.

  • Figure 5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 5. Alignment rates and total variants of 10 Labrador Retriever Illumina data sets from Sequence Read Archive.

    Accessions and additional metrics can be found in Table S1. (A) Reads alignment rates to CF (GCF_000002285.3, CamFam3.1, Boxer breed), GS (GCA_008641245.1, German Shepherd breed), and YA (Yella v1.0; Labrador Retriever breed) reference genomes (paired t test; CF versus YA P-value = 2.457 × 10−6, GS versus YA P-value = 1.397 × 10−3). (B) Total variants detected at Q > 29 in references (paired t test; CF versus YA P-value = 4.744 × 10−6, GS versus YA P-value = 3.931 × 10−6).

  • Figure S3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure S3. Annotation maps of refseq canine mitochondrial sequences.

    Sequence annotation maps of refseq canine mitochondrial sequence (top) and Yella v1.0 mitochondrial sequence (bottom). Maps generated from GeSeq’s Chlorobox annotation and visualization web tool (reference below). Alignment of Yella MT to refseq MT reveals 3 bps of insertions and 74 bps of deletion (alignment CIGAR: 2678M1I7233M2I6327M50D36M24D379M). Needleman-Wunsch pairwise alignment results 99.41 identity and similarity between these MT sequences.

  • Figure 6.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 6. Diagram of phased assembly pipeline.

    Divided into four primary sections: De Novo Assembly (Oxford Nanopore Technologies), De Novo Assembly (10X), Assembly Polishing, and Scaffolding.

Tables

  • Figures
  • Supplementary Materials
    • View popup
    Table 1.

    Effect of blood sample preservation agent on DNA yield.

    Storage agentNA isolation methodInput volume (uL)Output volume (uL)NA conc (ng/uL)Recovered NA total (ng)NA total per mL bloodNA quality (260/280)HMW DNA yielded?
    Proprietary (PAXgene)PCE1,7001,0006.376,3703,7472.20Yes
    Proprietary (PAXgene)Magmax Core NA purification200902.031839141.66Yes
    Proprietary (PAXgene)Nanobind CBB Big DNA kit20010011.101,1105,5501.87Yes
    Proprietary (PAXgene)PAXgene Blood DNA kit1,7001,0006.406,4003,7652.38No
    EDTA (purple top)PCE1,7001,0000.383802245.21Yes
    EDTA (purple top)Magmax Core NA Purification200902.632371,1841.62Yes
    EDTA (purple top)Nanobind CBB Big DNA kit20010035.303,53017,6501.84Yes
    EDTA (purple top)PAXgene Blood DNA kit1,7001,00010.8010,8006,3531.98No
    • Blood for one canine (Yella) was drawn directly into two tubes containing either a proprietary preservation agent, or EDTA. Three kits were tested against a phenol-chloroform extraction (PCE) standard method. Input and output volumes for each kit are shown, along with actual recovered total DNA mass. NA stands for nucleic acid. EDTA stands for ethylenediaminetetraacetic acid. The extremely high quality (5.21) observed for PCE is likely due to the presence of residual phenol in some samples, which is known to increase the 260/280 ratio beyond the normal quality range.

    • View popup
    Table 2.

    Variability of NA (nucleic acid) isolation method across four canine blood samples preserved in “purple top” tubes with EDTA.

    NA isolation methodTotal NA (ug) meanTotal NA Std. Dev.NA quality (260/280) meanNA quality (260/280) Std. Dev.High-MW DNA?
    PCE1.281.631.753.10Yes
    Magmax Core NA purification0.811.021.570.05Yes
    Nanobind CBB Big DNA kit2.921.991.850.03Yes
    PAXgene Blood DNA kit4.084.851.730.79No
    • DNA from purple top tubes was extracted using either phenol–chloroform extraction (PCE), or three commercial kits (Magmax, Nanobind, and PAXgene). Bold values represent the best performance in a particular category.

    • View popup
    Table 3.

    Breakdown of ONT sequencing runs, flow cells, library kit type, and estimated depth shown in Fig 2.

    Run #Flowcell #ONT kitTotal flow cellsEst. Depth
    11, 2SQK-LSK10926.66
    25, 6SQK-LSK10923.96
    1 + 21, 2, 5, 6SQK-LSK109410.02
    13, 4SQK-RAD00425.99
    27, 8SQK-RAD00426.65
    1 + 23, 4, 7, 8SQK-RAD004412.64
    11, 2, 3, 4RAD+LSK412.05
    25, 6, 7, 8RAD+LSK410.6
    1 + 21, 2, 3, 4, 5, 6, 7, 8RAD+LSK822.65
    • Flowcell number from Table 2.

    • View popup
    Table 4.

    Oxford Nanopore GridION sequencing run summaries using R9.4.1 flowcells.

    RunFlowcell #ONT kitTotal bpTotal readsRead N50Mean quality (Phred)
    11SQK-LSK1096,274,113,013658,35622,61911.7
    12SQK-LSK109a7,769,391,385934,47118,56212.2
    13SQK-RAD0046,301,883,8451,026,44511,86811.9
    14SQK-RAD004a7,573,765,6891,216,98412,32011.3
    25SQK-LSK1094,282,119,674392,25635,58411.38
    26SQK-LSK1094,889,116,279538,05126,88112.07
    27SQK-RAD0046,913,193,7611,128,65918,56210.58
    28SQK-RAD0048,493,017,2281,830,80911,86810.51
    • SQK-LSK109 is the ligation based library preparation kit. SQK-RAD004 is the transposon based rapid library preparation kit.

    • ↵a Size selection on extracted DNA, before library preparation using the Circulomics short read eliminator kit.

    • View popup
    Table 5.

    Illumina 10X library, NovaSeq S1 flowcell 300 cycle sequencing run summaries.

    RunLanePaired readRAWTRIMMED
    Total bpsTotal readsTotal bpsTotal reads
    3112.36E+10156,607,4292.00E+10155,880,038
    3122.36E+10156,607,4292.35E+10155,880,038
    3212.29E+10151,709,8751.94E+10151,035,675
    3222.29E+10151,709,8752.27E+10151,035,675
    4113.16E+10209,187,6202.68E+10208,419,758
    4123.16E+10209,187,6203.14E+10208,419,758
    4213.24E+10214,451,9642.75E+10213,618,769
    4223.24E+10214,451,9643.22E+10213,618,769
    Totals2.21E+111,463,913,7762.03E+111,457,908,480
    Est. depth95.3887.80
    • Insert size ∼400 bp, these libraries were not prepared with the intention of joining (hence, the 100-bp gap between pairs). Quality and adapter trimming was performed with cutadapt (including clipping the first 22 bases from R1).

    • View popup
    Table 6.

    Assembly metrics of Yella dog genome through the scaffolding process, with related dog genome assembly metrics for comparison.

    DescriptionTotal contigsLargest contigTotal length (Gb)GC contentN50 (Mb)L50N per 100 KbBUSCO scores
    CompleteFragmented
    CF, GCF_000002285.382123,773,6082.32841.06%47.71942995.20%2.50%
    GS, GCA_008641245.140126,700,0742.36741.21%64.51423693.70%3.40%
    CFGS, RaGOO of CF onto GS40123,868,2422.32841.06%64.21443092.90%3.80%
    JHMI 10X pseudohap10,39196,528,9032.41741.25%39.2221,90192.70%4.40%
    v0.01,60120,780,2282.29941.11%5.513000.20%1.10%
    v0.11,60021,039,2112.32640.98%5.6130032.00%21.80%
    v0.21,60021,018,8192.32441.17%5.6130094.80%2.70%
    v0.3a1,41221,088,4182.39441.30%5.413427095.20%2.60%
    v0.3b1,41321,084,3882.39441.30%5.413427095.20%2.50%
    v0.440131,668,4732.43541.30%64.9141,97292.40%4.20%
    v1.0a40138,659,5422.39441.30%64.31427695.00%2.50%
    v1.0b40138,666,7862.49341.30%64.31427695.10%2.30%
    • The (a) and (b) suffixes represent the different haplotype genomes. BUSCO scores calculated using v3 with the mammalia_odb9 dataset (missing % equals 100 − [Complete + Fragmented]).

Supplementary Materials

  • Figures
  • Tables
  • Table S1 Supplementary data from two storage and four nucleic acid (NA) extraction kits.

  • Table S2 Alignment rates and total variants of 10 Labrador Retriever Illumina sequence read data sets from Sequence Read Archive, with additional metrics and summary statistics.

  • Supplemental Data 1.

    [LSA-2020-00902_Supplemental_Data_1.pdf]

PreviousNext
Back to top
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Life Science Alliance.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS
(Your Name) has sent you a message from Life Science Alliance
(Your Name) thought you would like to see the Life Science Alliance web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Phased genome assembly for a Labrador Retriever
Robert A Player, Ellen R Forsyth, Kathleen J Verratti, David W Mohr, Alan F Scott, Christopher E Bradburne
Life Science Alliance Jan 2021, 4 (4) e202000902; DOI: 10.26508/lsa.202000902

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Phased genome assembly for a Labrador Retriever
Robert A Player, Ellen R Forsyth, Kathleen J Verratti, David W Mohr, Alan F Scott, Christopher E Bradburne
Life Science Alliance Jan 2021, 4 (4) e202000902; DOI: 10.26508/lsa.202000902
Reddit logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
Issue Cover

In this Issue

Volume 4, No. 4
April 2021
  • Table of Contents
  • Cover (PDF)
  • About the Cover
  • Masthead (PDF)
Advertisement

Jump to section

  • Article
    • Abstract
    • Introduction
    • Results
    • Discussion
    • Materials and Methods
    • Data Availability
    • Ethics Statement
    • Acknowledgements
    • References
  • Figures & Data
  • Info
  • Metrics
  • Reviewer Comments
  • PDF

Subjects

  • Genetics, Gene Therapy & Genetic Disease
  • Genomics & Functional Genomics
  • Systems & Computational Biology

Related Articles

  • No related articles found.

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • Interaction hub for telomerase function
  • Toxoplasma OTU deubiquitinases
  • Iodine-induced cellular toxicity
Show more Research Article

Similar Articles

EMBO Press LogoRockefeller University Press LogoCold Spring Harbor Logo

Content

  • Home
  • Newest Articles
  • Current Issue
  • Archive
  • Subject Collections

For Authors

  • Submit a Manuscript
  • Author Guidelines
  • License, copyright, Fee

Other Services

  • Alerts
  • Twitter
  • RSS Feeds

More Information

  • Editors & Staff
  • Reviewer Guidelines
  • Feedback
  • Licensing and Reuse
  • Privacy Policy

ISSN: 2575-1077
© 2023 Life Science Alliance LLC

Life Science Alliance is registered as a trademark in the U.S. Patent and Trade Mark Office and in the European Union Intellectual Property Office.