Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Next-generation genotype imputation service and methods

Abstract

Genotype imputation is a key component of genetic association studies, where it increases power, facilitates meta-analysis, and aids interpretation of signals. Genotype imputation is computationally demanding and, with current tools, typically requires access to a high-performance computing cluster and to a reference panel of sequenced genomes. Here we describe improvements to imputation machinery that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools. We also describe a new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of state space reduction.

Similar content being viewed by others

References

  1. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  2. Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).

  3. Gudbjartsson, D.F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

    Article  CAS  Google Scholar 

  4. Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).

    Article  CAS  Google Scholar 

  5. Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genomics Hum. Genet. 10, 387–406 (2009).

    Article  CAS  Google Scholar 

  6. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

    Article  CAS  Google Scholar 

  7. Pistis, G. et al. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur. J. Hum. Genet. 23, 975–983 (2015).

    Article  Google Scholar 

  8. Fuchsberger, C., Abecasis, G.R. & Hinds, D.A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015).

    Article  CAS  Google Scholar 

  9. Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).

    Article  CAS  Google Scholar 

  10. MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

    Article  CAS  Google Scholar 

  11. Cohen, J.C., Boerwinkle, E., Mosley, T.H. Jr. & Hobbs, H.H. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006).

    Article  CAS  Google Scholar 

  12. Stitziel, N.O. et al. Inactivating mutations in NPC1L1 and protection from coronary heart disease. N. Engl. J. Med. 371, 2072–2082 (2014).

    Article  Google Scholar 

  13. Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).

    Article  CAS  Google Scholar 

  14. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. http://dx.doi.org/10.1038/ng.3643 (2016).

  15. Pritchard, J.K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).

    Article  CAS  Google Scholar 

  16. Browning, B.L. & Browning, S.R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).

    Article  CAS  Google Scholar 

  17. Delaneau, O., Marchini, J. & Zagury, J.F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011).

    Article  Google Scholar 

  18. Delaneau, O., Zagury, J.F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).

    Article  CAS  Google Scholar 

  19. Paul, J.S. & Song, Y.S. Blockwise HMM computation for large-scale population genomic inference. Bioinformatics 28, 2008–2015 (2012).

    Article  CAS  Google Scholar 

  20. Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).

    Article  CAS  Google Scholar 

  21. Markianos, K., Daly, M.J. & Kruglyak, L. Efficient multipoint linkage analysis through reduction of inheritance space. Am. J. Hum. Genet. 68, 963–977 (2001).

    Article  CAS  Google Scholar 

  22. Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    Article  Google Scholar 

  23. Dean, J. & Ghemawat, S. Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008).

    Article  Google Scholar 

  24. Schönherr, S. et al. Cloudgene: a graphical execution platform for MapReduce programs on private and public clouds. BMC Bioinformatics 13, 200 (2012).

    Article  Google Scholar 

  25. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  26. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  27. International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).

  28. Plagnol, V. & Wall, J.D. Possible ancestral structure in human populations. PLoS Genet. 2, e105 (2006).

    Article  Google Scholar 

  29. Li, Y., Willer, C.J., Ding, J., Scheet, P. & Abecasis, G.R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

    Article  Google Scholar 

  30. Baum, L.E., Petrie, T., Soules, G. & Weiss, N. A maximization technique occurring in statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41, 164–171 (1970).

    Article  Google Scholar 

  31. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

    Article  CAS  Google Scholar 

  32. Fritsche, L.G. et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat. Genet. 48, 134–143 (2016).

    Article  CAS  Google Scholar 

  33. Vrieze, S.I. et al. In search of rare variants: preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes. Psychophysiology 51, 1309–1320 (2014).

    Article  Google Scholar 

  34. Williams, A.L., Patterson, N., Glessner, J., Hakonarson, H. & Reich, D. Phasing of many thousands of genotyped samples. Am. J. Hum. Genet. 91, 238–251 (2012).

    Article  CAS  Google Scholar 

  35. Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).

    Article  Google Scholar 

  36. Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge D. Hinds for assistance with minimac3 code optimizations and A.L. Williams for providing HAPI-UR. We acknowledge support from National Institutes of Health grants HG007022 and HL117626 (G.R.A.), HG000376 (M.B.), and R01DA037904 (S.I.V.), Austrian Science Fund (FWF) grant J-3401 (C.F.), and the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement 602133 (L.F. and S.S.). This work was also supported in part by the Intramural Research Program of the National Institute on Aging, National Institutes of Health (D. Schlessinger).

Author information

Authors and Affiliations

Authors

Contributions

S.D., L.F., S.S., G.R.A., and C.F. designed the methods and experiments. C.S., A.E.L., A.K., S.I.V., E.Y.C., S.L., M.M., D. Schlessinger, P.-R.L., D. Stambolian, W.G.I., A.S., L.J.S., F.C., F.K., and M.B. provided data or tools. S.D., G.R.A., and C.F. wrote the first draft. All authors contributed critical reviews of the manuscript during its preparation.

Corresponding authors

Correspondence to Gonçalo R Abecasis or Christian Fuchsberger.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Imputation server overview.

The imputation workflow uses two MapReduce jobs to parallelize the quality control and the phasing/imputation step.

Supplementary Figure 2 Quality control workflow for each variant site.

Supplementary Figure 3 Parameter estimation study.

The figure compares the imputation accuracy across three parameter estimation methods on six different populations from the Human Genome Diversity Project (HGDP) on chromosomes 20–22.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3, Supplementary Tables 1–4 and Supplementary Note. (PDF 2575 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, S., Forer, L., Schönherr, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016). https://doi.org/10.1038/ng.3656

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3656

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research