RefRGim: an intelligent reference panel reconstruction method for genotype imputation with convolutional neural networks.

Shuo Shi, Qiheng Qian, Shuhuan Yu, Qi Wang, Jinyue Wang, Jingyao Zeng, Zhenglin Du, Jingfa Xiao
Author Information
  1. Shuo Shi: National Genomics Data Center of Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
  2. Qiheng Qian: National Genomics Data Center of Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
  3. Shuhuan Yu: National Genomics Data Center of Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
  4. Qi Wang: Qujiang culture finance holding (Group) Co., Ltd, Xian, China.
  5. Jinyue Wang: National Genomics Data Center of Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
  6. Jingyao Zeng: National Genomics Data Center of Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
  7. Zhenglin Du: National Genomics Data Center of Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
  8. Jingfa Xiao: National Genomics Data Center of Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China. ORCID

Abstract

Genotype imputation is a statistical method for estimating missing genotypes from a denser haplotype reference panel. Existing methods usually performed well on common variants, but they may not be ideal for low-frequency and rare variants. Previous studies showed that the population similarity between study and reference panels is one of the key factors influencing the imputation accuracy. Here, we developed an imputation reference panel reconstruction method (RefRGim) using convolutional neural networks (CNNs), which can generate a study-specified reference panel for each input data based on the genetic similarity of individuals from current study and references. The CNNs were pretrained with single nucleotide polymorphism data from the 1000 Genomes Project. Our evaluations showed that genotype imputation with RefRGim can achieve higher accuracies than original reference panel, especially for low-frequency and rare variants. RefRGim will serve as an efficient reference panel reconstruction method for genotype imputation. RefRGim is freely available via GitHub: https://github.com/shishuo16/RefRGim.

Keywords

References

  1. Nat Rev Genet. 2011 Sep 16;12(10):703-14 [PMID: 21921926]
  2. Genomics Proteomics Bioinformatics. 2019 Jun;17(3):229-247 [PMID: 31494266]
  3. Hum Hered. 2018;83(3):107-116 [PMID: 30669139]
  4. Nature. 2016 Oct 13;538(7624):201-206 [PMID: 27654912]
  5. Nat Genet. 2009 Jul;41(7):776-82 [PMID: 19525953]
  6. Bioinformatics. 2016 Sep 15;32(18):2847-9 [PMID: 27207943]
  7. Nat Rev Genet. 2010 Jul;11(7):499-511 [PMID: 20517342]
  8. Nature. 2015 Oct 1;526(7571):68-74 [PMID: 26432245]
  9. IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16 [PMID: 26353135]
  10. Nucleic Acids Res. 2020 Jan 8;48(D1):D659-D667 [PMID: 31584087]
  11. PLoS One. 2015 Jan 26;10(1):e0116487 [PMID: 25621886]
  12. G3 (Bethesda). 2011 Nov;1(6):457-70 [PMID: 22384356]
  13. Nat Genet. 2016 Oct;48(10):1279-83 [PMID: 27548312]
  14. Nat Commun. 2015 Aug 21;6:8018 [PMID: 26292667]
  15. Am J Hum Genet. 2018 Sep 6;103(3):338-348 [PMID: 30100085]
  16. Nat Genet. 2016 Oct;48(10):1284-1287 [PMID: 27571263]
  17. PLoS Genet. 2009 Jun;5(6):e1000529 [PMID: 19543373]
  18. Hum Mol Genet. 2011 Nov 15;20(22):4491-503 [PMID: 21852243]
  19. Genetics. 2013 Oct;195(2):319-30 [PMID: 23934887]
  20. Genet Epidemiol. 2010 Dec;34(8):773-82 [PMID: 21058333]
  21. Genome Biol. 2017 Apr 27;18(1):77 [PMID: 28449691]
  22. PLoS Genet. 2009 May;5(5):e1000477 [PMID: 19492015]
  23. Am J Hum Genet. 2019 Jan 3;104(1):13-20 [PMID: 30609404]
  24. Nat Commun. 2015 Sep 14;6:8111 [PMID: 26368830]
  25. Insights Imaging. 2018 Aug;9(4):611-629 [PMID: 29934920]
  26. Am J Hum Genet. 2009 Feb;84(2):235-50 [PMID: 19215730]
  27. Eur J Hum Genet. 2014 Nov;22(11):1321-6 [PMID: 24896149]

MeSH Term

Algorithms
Computational Biology
Databases, Genetic
Deep Learning
Genetics, Population
Genome-Wide Association Study
Genotype
Genotyping Techniques
Humans
Neural Networks, Computer
Reproducibility of Results
Software
Web Browser

Word Cloud

Similar Articles

Cited By