Improving polygenic prediction in ancestrally diverse populations

Ruan, Yunfeng; Lin, Yen-Feng; Feng, Yen-Chen Anne; Chen, Chia-Yen; Lam, Max; Guo, Zhenglin; He, Lin; Sawa, Akira; Martin, Alicia R.; Qin, Shengying; Huang, Hailiang; Ge, Tian

doi:10.1038/s41588-022-01054-7

Article
Published: 05 May 2022

Improving polygenic prediction in ancestrally diverse populations

Nature Genetics volume 54, pages 573–580 (2022)Cite this article

18k Accesses
138 Citations
157 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 04 July 2022

This article has been updated

Abstract

Polygenic risk scores (PRS) have attenuated cross-population predictive performance. As existing genome-wide association studies (GWAS) have been conducted predominantly in individuals of European descent, the limited transferability of PRS reduces their clinical value in non-European populations, and may exacerbate healthcare disparities. Recent efforts to level ancestry imbalance in genomic research have expanded the scale of non-European GWAS, although most remain underpowered. Here, we present a new PRS construction method, PRS-CSx, which improves cross-population polygenic prediction by integrating GWAS summary statistics from multiple populations. PRS-CSx couples genetic effects across populations via a shared continuous shrinkage (CS) prior, enabling more accurate effect size estimation by sharing information between summary statistics and leveraging linkage disequilibrium diversity across discovery samples, while inheriting computational efficiency and robustness from PRS-CS. We show that PRS-CSx outperforms alternative methods across traits with a wide range of genetic architectures, cross-population genetic overlaps and discovery GWAS sample sizes in simulations, and improves the prediction of quantitative traits and schizophrenia risk in non-European populations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of polygenic prediction methods.**

**Fig. 2: Prediction accuracy of single-discovery and multi-discovery polygenic prediction methods in simulations.**

**Fig. 3: Relative prediction accuracy for quantitative traits in each target population.**

**Fig. 4: Prediction accuracy of schizophrenia risk in EAS cohorts.**

Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics

Article Open access 14 February 2023

BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability

Article Open access 20 December 2023

A new method for multiancestry polygenic prediction improves performance across diverse populations

Article 25 September 2023

Data availability

Publicly available data are available from the following sites: 1KG Phase 3 reference panels: https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html; Genetic map for each subpopulation: ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rates; UKBB summary statistics: http://www.nealelab.is/uk-biobank (‘GWAS round 2’ was used in this study); BBJ summary statistics were downloaded from PheWeb: https://pheweb.jp; PAGE summary statistics were downloaded from the GWAS Catalog: https://www.ebi.ac.uk/gwas/downloads/summary-statistics; PGC wave 2 schizophrenia GWAS (49 EUR cohorts): https://www.med.unc.edu/pgc/download-results/; leave-one-out schizophrenia EAS summary statistics are available upon request to the Schizophrenia Working Group of the PGC (https://www.med.unc.edu/pgc/pgc-workgroups/schizophrenia/). These leave-one-out summary statistics are under controlled access per the data use limitation imposed by compliance, participant consent and/or national laws. Application to access such data requires a short research proposal that will go through review and approval process of the PGC. This process takes 2 weeks. Individual-level schizophrenia data of East Asian ancestry are available upon application to the Stanley Global Asia Initiatives: SGAI@broadinstitute.org. These data must be under controlled access due to the data use limitation imposed by the compliance, participant consent and national laws. Application to access such data requires a short research proposal that will be reviewed by principal investigator of the constituent study and, if necessary, by the respective ethic committee. The principal investigator review process takes 2 weeks. TWB data used in this study contain protected health information and are thus under controlled access. Application to access such data can be made to the TWB (https://www.twbiobank.org.tw/new_web_en/). Posterior SNP effect size estimates generated by PRS-CSx for the traits examined in this work: https://github.com/getian107/PRScsx.

Code availability

The code used in this study is available from the following websites: PRS-CSx: https://github.com/getian107/PRScsx (https://doi.org/10.5281/zenodo.5893746); PRS-CS: https://github.com/getian107/PRScs (https://doi.org/10.5281/zenodo.5893748); LDpred2: https://privefl.github.io/bigsnpr/articles/LDpred2; PRSice-2: https://www.prsice.info; HAPGEN2: https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html; PLINK 1.9: https://www.cog-genomics.org/plink; PLINK 2.0: https://www.cog-genomics.org/plink/2.0/; LD score regression: https://github.com/bulik/ldsc; POPCORN: https://github.com/brielin/Popcorn; Interpolation of genetic maps: https://github.com/joepickrell/1000-genomes-genetic-maps; Population assignment: https://github.com/Annefeng/PBK-QC-pipeline.

Change history

04 July 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41588-022-01144-6

References

Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Article CAS PubMed PubMed Central Google Scholar
Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596.e9 (2019).
Article CAS PubMed PubMed Central Google Scholar
Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).
Article CAS PubMed Google Scholar
Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zheutlin, A. B. et al. Penetrance and pleiotropy of polygenic risk scores for schizophrenia in 106,160 patients across four health care systems. Am. J. Psychiatry 176, 846–855 (2019).
Article PubMed PubMed Central Google Scholar
Lambert, S. A., Abraham, G. & Inouye, M. Towards clinical utility of polygenic risk scores. Hum. Mol. Genet. 28, R133–R142 (2019).
Article CAS PubMed Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).
Article PubMed PubMed Central Google Scholar
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 1–9 (2019).
Article CAS Google Scholar
Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hindorff, L. A. et al. Prioritizing diversity in human genomics research. Nat. Rev. Genet. 19, 175–185 (2018).
Article CAS PubMed Google Scholar
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Article CAS PubMed PubMed Central Google Scholar
Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Article CAS PubMed PubMed Central Google Scholar
Shi, H. et al. Localizing components of shared transethnic genetic architecture of complex traits from GWAS summary data. Am. J. Hum. Genet. 106, 805–817 (2020).
Article CAS PubMed PubMed Central Google Scholar
Shi, H. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 12, 1098–15 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Article PubMed PubMed Central Google Scholar
Privé, F., Arbel, J. & Vilhjalmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).
Article PubMed Central Google Scholar
Vilhjalmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).
Article PubMed PubMed Central Google Scholar
Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41, 469–480 (2017).
Article PubMed Google Scholar
Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. & Tang, H. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101, 218–226 (2017).
Article CAS PubMed PubMed Central Google Scholar
Grinde, K. E. et al. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet. Epidemiol. 43, 50–62 (2019).
Article PubMed Google Scholar
Marquez-Luna, C., Loh, P.-R., South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium, & Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 41, 811–823 (2017).
Article PubMed PubMed Central Google Scholar
Weissbrod, O. et al. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet. 54, 450–458 (2022).
Article CAS PubMed Google Scholar
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Article CAS PubMed Google Scholar
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Article CAS PubMed Google Scholar
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, C.-Y. et al. Analysis across Taiwan Biobank, Biobank Japan and UK Biobank identifies hundreds of novel loci for 36 quantitative traits. Preprint at medRxiv https://doi.org/10.1101/2021.04.12.21255236 (2021).
Feng, Y.-C. A. et al. Taiwan Biobank: a rich biomedical research database of the Taiwanese population. Preprint at medRxiv https://doi.org/10.1101/2021.12.21.21268159 (2021).
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article PubMed Central Google Scholar
International Schizophrenia Consortium et al.Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Article PubMed Central Google Scholar
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Su, Z., Marchini, J. & Donnelly, P. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics 27, 2304–2305 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992).
Article Google Scholar
Ge, T. et al. Validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.11.21263413 (2021).
Majara, L. et al. Low generalizability of polygenic scores in African populations due to genetic and environmental diversity. Preprint at bioRxiv https://doi.org/10.1101/2021.01.12.426453 (2021).
Atkinson, E. G. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet. 53, 195–204 (2021).
Article CAS PubMed PubMed Central Google Scholar
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
Article CAS PubMed PubMed Central Google Scholar
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
CAS PubMed Google Scholar
Choi, S. W. & O’Reilly, P. F. PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience 8, 2091 (2019).
Article Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 360, 1411–1753 (2018).
Google Scholar
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Article CAS PubMed PubMed Central Google Scholar
Speed, D., Holmes, J. & Balding, D. J. Evaluating and improving heritability models using summary statistics. Nat. Genet. 52, 458–462 (2020).
Article CAS PubMed Google Scholar
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lam, M. et al. RICOPILI: Rapid Imputation for COnsortias PIpeLIne. Bioinformatics 36, 930–933 (2020).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank B. Neale, M. Daly, R. Do and A. Bloemendal for helpful discussions. We thank the Neale Laboratory and BBJ for releasing the genome-wide association summary statistics from UKBB and BBJ. Individual-level phenotypes and genotypes for UKBB samples were obtained under application 32568. We thank the Schizophrenia Working Group of the PGC for providing the GWAS summary statistics for schizophrenia. T.G. is supported by National Institute on Aging (NIA) K99/R00AG054573, National Human Genome Research Institute (NHGRI) U01HG008685 and NHGRI U01HG011723. H.H. acknowledges supports from National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) K01DK114379, National Institute of Mental Health (NIMH) U01MH109539, Brain and Behavior Research Foundation Young Investigator Grant (28450), the Zhengxu and Ying He Foundation, and the Stanley Center for Psychiatric Research. L.H. and S.Q. are supported by Shanghai Municipal Science and Technology Major Project (2017SHZDZX01). A.R.M. is supported by NIMH K99/R00MH117229. A.S. is supported by NIMH P50MH094268. Y.A.F. is supported by the ‘National Taiwan University Higher Education Sprout Project (NTU-110L8810)’ within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan. Y.F.L. is supported by the National Health Research Institutes (NP-109-PP-09), and the Ministry of Science and Technology (109-2314-B-400-017) of Taiwan.

Author information

These authors jointly supervised this work: Shengying Qin, Hailiang Huang, Tian Ge.

Authors and Affiliations

Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Yunfeng Ruan, Yen-Chen Anne Feng, Max Lam, Zhenglin Guo, Ruize Liu, Alice Zheng, Alicia R. Martin, Hailiang Huang & Tian Ge
Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
Yunfeng Ruan, Lin He & Shengying Qin
Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
Yen-Feng Lin
Department of Public Health and Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Yen-Feng Lin
Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
Yen-Feng Lin
Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Yen-Chen Anne Feng & Tian Ge
Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Yen-Chen Anne Feng & Tian Ge
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
Yen-Chen Anne Feng, Max Lam, Ruize Liu, Alicia R. Martin & Hailiang Huang
Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
Yen-Chen Anne Feng
Master of Public Health Program, National Taiwan University, Taipei, Taiwan
Yen-Chen Anne Feng
Biogen, Cambridge, MA, USA
Chia-Yen Chen
Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, USA
Max Lam
Research Division, Institute of Mental Health Singapore, Singapore, Singapore
Max Lam
Human Genetics, Genome Institute of Singapore, Singapore, Singapore
Max Lam
Departments of Psychiatry, Neuroscience, Biomedical Engineering, Genetic Medicine, and Mental Health, Johns Hopkins University School of Medicine and Bloomberg School of Public Health, Baltimore, MD, USA
Akira Sawa
Department of Medicine, Harvard Medical School, Boston, MA, USA
Alicia R. Martin & Hailiang Huang
Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
Tian Ge
Department of Psychiatry, Seoul National University Hospital, Seoul, Korea
Yong Min Ahn, Kyooseob Ha & Se Hyun Kim
Department of Biological Psychiatry and Neuroscience, Dokkyo Medical University School of Medicine, Mibu, Japan
Kazufumi Akiyama
Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
Makoto Arai, Yasue Horiuchi & Masanari Itokawa
Department of Psychiatry, Sungkyunkwan University, Samsung Medical Center, Seoul, Korea
Ji Hyun Baek & Kyung Sue Hong
Department of Psychiatry, National Taiwan University Hospital and College of Medicine, National Taiwan University, Taipei, Taiwan
Wei J. Chen
Department of Psychiatry, Chonbuk National University Medical School, Jeonbuk, Korea
Young-Chul Chung
Digital China Health Technologies Corp. Ltd., Beijing, China
Gang Feng & Wenzhao Shi
Department of Psychiatry, Shiga University of Medical Science, Shiga, Japan
Kumiko Fujii & Yuji Ozeki
Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
Stephen J. Glatt
Department of Neuroscience and Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
Stephen J. Glatt
National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, Japan
Kotaro Hattori, Sayuri Ishiwata & Hiroshi Kunugi
National Center of Neurology and Psychiatry, Tokyo, Japan
Teruhiko Higuchi
Department of Psychiatry, Kobe University Graduate School of Medicine, Kobe, Japan
Akitoyo Hishimoto, Ikuo Otsuka & Ichiro Sora
Department of Psychiatry, National Taiwan University, Taipei, Taiwan
Hai-Gwo Hwu
Department of Psychiatry, Fujita Health University School of Medicine, Toyoake, Japan
Masashi Ikeda & Nakao Iwata
Department of Neuropsychiatry, School of Medicine, Eulji University, Daejeon, Korea
Eun-Jeong Joo
Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Rene S. Kahn
Department of Psychiatry, Chonnam National University Medical School, Gwangju, Korea
Sung-Wan Kim
Department of Psychiatry, Yonsei University College of Medicine, Seoul, Korea
Se Joo Kim
Department of Psychiatry, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
Makoto Kinoshita, Shusuke Numata & Tetsuro Ohmori
Department of Psychiatry, University of Indonesia, Jakarta, Indonesia
Agung Kusumawardhani
Institute of Mental Health, Singapore, Singapore
Jimmy Lee & Kang Sim
Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
Jimmy Lee
Department of Psychiatry, Pusan National University Hospital, Busan, Korea
Byung Dae Lee
Department of Psychiatry, Korea University College of Medicine, Seoul, Korea
Heon-Jeong Lee
Genome Institute of Singapore, A*STAR, Singapore, Singapore
Jianjun Liu
Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Jianjun Liu
Department of Psychiatry, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China
Xiancang Ma
Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Korea
Woojae Myung
Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Ikuo Otsuka
School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, Australia
Sibylle G. Schwab
Illawarra Health and Medical Research Institute, Wollongong, Australia
Sibylle G. Schwab
Department of Psychiatry, Dokkyo Medical University School of Medicine, Mibu, Japan
Kazutaka Shimoda
Department of Psychiatry, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
Jinsong Tang
Key Laboratory of Medical Neurobiology of Zhejiang Province, Hangzhou, Zhejiang, China
Jinsong Tang
Department of Psychiatry, the Second Xiangya Hospital, Central South University, Changsha, Hunan, China
Jinsong Tang
National Clinical Research Center on Mental Disorders, Changsha, Hunan, China
Jinsong Tang
Laboratory for Molecular Psychiatry, RIKEN Center for Brain Science, Wako, Japan
Tomoko Toyota & Takeo Yoshikawa
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Ming Tsuang
University of Western Australia, Perth, Australia
Dieter B. Wildenauer
Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Korea
Hong-Hee Won
Center for Translational Medicine, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China
Feng Zhu

Authors

Yunfeng Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Feng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Chen Anne Feng
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Yen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Max Lam
View author publications
You can also search for this author in PubMed Google Scholar
Zhenglin Guo
View author publications
You can also search for this author in PubMed Google Scholar
Lin He
View author publications
You can also search for this author in PubMed Google Scholar
Akira Sawa
View author publications
You can also search for this author in PubMed Google Scholar
Alicia R. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Shengying Qin
View author publications
You can also search for this author in PubMed Google Scholar
Hailiang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Tian Ge
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

Stanley Global Asia Initiatives

Yong Min Ahn
, Kazufumi Akiyama
, Makoto Arai
, Ji Hyun Baek
, Wei J. Chen
, Young-Chul Chung
, Gang Feng
, Kumiko Fujii
, Stephen J. Glatt
, Zhenglin Guo
, Kyooseob Ha
, Kotaro Hattori
, Teruhiko Higuchi
, Akitoyo Hishimoto
, Kyung Sue Hong
, Yasue Horiuchi
, Hailiang Huang
, Hai-Gwo Hwu
, Masashi Ikeda
, Sayuri Ishiwata
, Masanari Itokawa
, Nakao Iwata
, Eun-Jeong Joo
, Rene S. Kahn
, Sung-Wan Kim
, Se Joo Kim
, Se Hyun Kim
, Makoto Kinoshita
, Hiroshi Kunugi
, Agung Kusumawardhani
, Jimmy Lee
, Byung Dae Lee
, Heon-Jeong Lee
, Jianjun Liu
, Ruize Liu
, Xiancang Ma
, Woojae Myung
, Shusuke Numata
, Tetsuro Ohmori
, Ikuo Otsuka
, Yuji Ozeki
, Shengying Qin
, Yunfeng Ruan
, Akira Sawa
, Sibylle G. Schwab
, Wenzhao Shi
, Kazutaka Shimoda
, Kang Sim
, Ichiro Sora
, Jinsong Tang
, Tomoko Toyota
, Ming Tsuang
, Dieter B. Wildenauer
, Hong-Hee Won
, Takeo Yoshikawa
, Alice Zheng
& Feng Zhu

Contributions

H.H. and T.G. designed the project; T.G. developed the statistical methods and programmed the code for PRS-CSx. Y.R. and T.G. conducted simulation studies. Y.R. and T.G performed the analysis in the UK Biobank; Y.-F.L. performed the analysis in the Taiwan Biobank. Y.R. performed the analysis in the schizophrenia cohorts. Y.-C.A.F. assigned the UKBB samples into superpopulation groups. C.-Y.C. provided critical suggestions for the study design. M.L. took part in the testing of the code and preprocessed schizophrenia East Asian cohorts. Z.G., L.H., A.S. and S.Q. contributed to the generation and preprocessing of schizophrenia East Asian data. Y.R., H.H. and T.G. wrote the manuscript; Y.-C.A.F., C.-Y.C. and A.R.M. provided critical revision for the manuscript. All the authors reviewed and approved the final version of the manuscript.

Corresponding authors

Correspondence to Shengying Qin, Hailiang Huang or Tian Ge.

Ethics declarations

Competing interests

C.Y.C. is an employee of Biogen. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Yixuan Ye, Shing Wan Choi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Prediction accuracy of different polygenic prediction methods across different genetic architectures.

Phenotypes were simulated using 0.1%, 1% or 10% of randomly sampled causal variants (shared across populations), a cross-population genetic correlation of 0.7, and SNP heritability of 50%. PRS were trained using 100 K EUR samples and 20 K non-EUR (EAS or AFR) samples. Numerical results are reported in Supplementary Table 2.

Extended Data Fig. 2 Prediction accuracy of different polygenic prediction methods across different cross-population genetic correlations.

Phenotypes were simulated using 1% of randomly sampled causal variants (shared across populations), a cross-population genetic correlation of 0.4, 0.7 or 1.0, and SNP heritability of 50%. PRS were trained using 100 K EUR samples and 20 K non-EUR (EAS or AFR) samples. Numerical results are reported in Supplementary Table 3.

Extended Data Fig. 3 Prediction accuracy of different polygenic prediction methods across different discovery GWAS sample sizes.

Phenotypes were simulated using 1% of randomly sampled causal variants (shared across populations), a cross-population genetic correlation of 0.7, and SNP heritability of 50%. PRS were trained using 50 K EUR and 10 K non-EUR (EAS or AFR) samples, 100 K EUR and 20 K non-EUR samples, 200 K EUR and 40 K non-EUR samples, or 300 K EUR and 60 K non-EUR samples. Numerical results are reported in Supplementary Table 4.

Extended Data Fig. 4 Prediction accuracy of different polygenic prediction methods across different ratios of EUR vs. non-EUR GWAS sample sizes.

Phenotypes were simulated using 1% of randomly sampled causal variants (shared across populations), a cross-population genetic correlation of 0.7, and SNP heritability of 50%. PRS were trained using 120 K EUR samples without non-EUR samples, 100 K EUR and 20 K non-EUR (EAS or AFR) samples, 80 K EUR and 40 K non-EUR samples, or 60 K EUR and 60 K non-EUR samples. Numerical results are reported in Supplementary Table 5.

Extended Data Fig. 5 Prediction accuracy of different polygenic prediction methods across different SNP heritability.

Phenotypes were simulated using 1% of randomly sampled causal variants (shared across populations) and a cross-population genetic correlation of 0.7. SNP heritability was fixed at 50% in each population, 50% in the EUR population and 25% in the non-EUR population, or 25% in the EUR population and 50% in the non-EUR population. PRS were trained using 100 K EUR samples and 20 K non-EUR (EAS or AFR) samples. Numerical results are reported in Supplementary Table 6.

Extended Data Fig. 6 Prediction accuracy of different polygenic prediction methods across different proportions of shared causal variants between populations.

Phenotypes were simulated using 1% of randomly sampled causal variants. 100%, 70% or 40% of the causal variants were shared across populations. Shared causal variants had a cross-population genetic correlation of 0.7. SNP heritability was fixed at 50%. PRS were trained using 100 K EUR samples and 20 K non-EUR (EAS or AFR) samples. Numerical results are reported in Supplementary Table 7.

Extended Data Fig. 7 Prediction accuracy of different polygenic prediction methods when SNP effect sizes are minor allele frequency (MAF) and linkage disequilibrium (LD) dependent.

Phenotypes were simulated using 1% of randomly sampled causal variants (shared across populations), a cross-population genetic correlation of 0.7, and SNP heritability of 50%. SNP effect sizes were dependent on MAF and LD scores such that SNPs with lower MAF and located in lower LD regions tended to have larger effect sizes. PRS were trained using 100 K EUR samples and 20 K non-EUR (EAS or AFR) samples. Numerical results are reported in Supplementary Table 8.

Extended Data Fig. 8 Relative prediction accuracy for quantitative traits across target populations.

Relative prediction performance for single-discovery and multi-discovery PRS construction methods using discovery GWAS summary statistics a, from UKBB and BBJ, across 33 traits, in different UKBB target populations (EUR, EAS and AFR); b, from UKBB and BBJ, across 21 traits, in the Taiwan Biobank (TWB); c, from UKBB, BBJ and PAGE, across 14 traits, in different UKBB target populations (EUR, EAS and AFR). Each data point shows the relative increase of prediction performance, defined as R²/R²_{PRS-CS (UKBB)-EUR} - 1, in which R²_{PRS-CS (UKBB)-EUR} is the R² of the trait in the EUR population using PRS-CS trained on the UKBB GWAS summary statistics. In UKBB target populations (panels a and c), R² were averaged across 100 random splits of the target samples into validation and testing datasets. The crossbar indicates the median of the relative increase of predictive performance across the traits examined. ‘median N’ indicates the median sample size across the respective discovery GWAS.

Extended Data Fig. 9 Trace plots and autocorrelation functions (ACFs) for assessing the convergence and mixing of the Gibbs sampler used in PRS-CSx.

Left panels: Trace plots, after discarding the burn-in iterations and thinning the Markov chain by a factor of 5, for the posterior effects of rs7412 on low-density lipoprotein cholesterol when integrating UKBB, BBJ and PAGE GWAS summary statistics using PRS-CSx. Right panels: The autocorrelation functions (ACFs) for the traces shown on the left.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruan, Y., Lin, YF., Feng, YC.A. et al. Improving polygenic prediction in ancestrally diverse populations. Nat Genet 54, 573–580 (2022). https://doi.org/10.1038/s41588-022-01054-7

Download citation

Received: 21 December 2020
Accepted: 16 March 2022
Published: 05 May 2022
Issue Date: May 2022
DOI: https://doi.org/10.1038/s41588-022-01054-7

This article is cited by

Multi-trait GWAS for diverse ancestries: mapping the knowledge gap
- Lucie Troubat
- Deniz Fettahoglu
- Hanna Julienne
BMC Genomics (2024)
Recent advances in polygenic scores: translation, equitability, methods and FAIR tools
- Ruidong Xiang
- Martin Kelemen
- Samuel A. Lambert
Genome Medicine (2024)
BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability
- Clive J. Hoggart
- Shing Wan Choi
- Paul F. O’Reilly
Nature Genetics (2024)
Principles and methods for transferring polygenic risk scores across global populations
- Linda Kachuri
- Nilanjan Chatterjee
- Tian Ge
Nature Reviews Genetics (2024)
Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI
- Quan Sun
- Bryce T. Rowland
- Yun Li
Nature Communications (2024)