CMDB: the comprehensive population genome variation database of China.

Zhichao Li, Xiaosen Jiang, Mingyan Fang, Yong Bai, Siyang Liu, Shujia Huang, Xin Jin
Author Information
  1. Zhichao Li: College of Life Sciences, University of Chinese Academy of Sciences, Beijing100049, China. ORCID
  2. Xiaosen Jiang: College of Life Sciences, University of Chinese Academy of Sciences, Beijing100049, China.
  3. Mingyan Fang: BGI-Shenzhen, Shenzhen518083, Guangdong, China. ORCID
  4. Yong Bai: BGI-Shenzhen, Shenzhen518083, Guangdong, China.
  5. Siyang Liu: BGI-Shenzhen, Shenzhen518083, Guangdong, China.
  6. Shujia Huang: BGI-Shenzhen, Shenzhen518083, Guangdong, China. ORCID
  7. Xin Jin: BGI-Shenzhen, Shenzhen518083, Guangdong, China.

Abstract

A high-quality genome variation database derived from a large-scale population is one of the most important infrastructures for genomics, clinical and translational medicine research. Here, we developed the Chinese Millionome Database (CMDB), a database that contains 9.04 million single nucleotide variants (SNV) with allele frequency information derived from low-coverage (0.06×-0.1×) whole-genome sequencing (WGS) data of 141 431 unrelated healthy Chinese individuals. These individuals were recruited from 31 out of the 34 administrative divisions in China, covering Han and 36 other ethnic minorities. CMDB, housing the WGS data of a multi-ethnic Chinese population featuring wide geographical distribution, has become the most representative and comprehensive Chinese population genome database to date. Researchers can quickly search for variant, gene or genomic regions to obtain the variant information, including mutation basic information, allele frequency, genic annotation and overview of frequencies in global populations. Furthermore, the CMDB also provides information on the association of the variants with a range of phenotypes, including height, BMI, maternal age and twin pregnancy. Based on these data, researchers can conduct meta-analysis of related phenotypes. CMDB is freely available at https://db.cngb.org/cmdb/.

References

  1. Bioinformatics. 2009 Jul 15;25(14):1754-60 [PMID: 19451168]
  2. Cell Rep. 2021 Nov 16;37(7):110017 [PMID: 34788621]
  3. Nat Commun. 2015 Aug 21;6:8018 [PMID: 26292667]
  4. Nat Genet. 2016 Aug;48(8):965-969 [PMID: 27376236]
  5. Cell. 2018 Oct 4;175(2):347-359.e14 [PMID: 30290141]
  6. Cell Res. 2020 Sep;30(9):717-731 [PMID: 32355288]
  7. Nucleic Acids Res. 2020 Jan 8;48(D1):D971-D976 [PMID: 31584086]
  8. Gigascience. 2018 Jan 1;7(1):1-6 [PMID: 29220494]
  9. Nature. 2018 Oct;562(7726):203-209 [PMID: 30305743]
  10. Nature. 2015 Oct 1;526(7571):68-74 [PMID: 26432245]
  11. BMJ. 2018 Apr 24;361:k1687 [PMID: 29691228]
  12. Genome Med. 2019 Nov 26;11(1):74 [PMID: 31771638]
  13. Sci Rep. 2018 Apr 4;8(1):5677 [PMID: 29618732]
  14. Nature. 2015 Oct 1;526(7571):82-90 [PMID: 26367797]
  15. Genome Biol. 2016 Jun 06;17(1):122 [PMID: 27268795]
  16. Nat Genet. 2015 May;47(5):435-44 [PMID: 25807286]
  17. Nature. 2020 May;581(7809):434-443 [PMID: 32461654]
  18. Nucleic Acids Res. 2017 Jan 4;45(D1):D840-D845 [PMID: 27899611]
  19. J Med Genet. 2018 Nov;55(11):735-743 [PMID: 30061371]
  20. Eur J Hum Genet. 2021 Nov;29(11):1710-1718 [PMID: 34002043]
  21. PLoS Genet. 2009 Jun;5(6):e1000529 [PMID: 19543373]
  22. Curr Protoc Hum Genet. 2017 Jul 11;94:8.17.1-8.17.16 [PMID: 28696555]
  23. Nat Rev Genet. 2016 May 17;17(6):333-51 [PMID: 27184599]
  24. Genome Res. 2011 Jun;21(6):940-51 [PMID: 21460063]
  25. Hum Genome Var. 2019 Jun 18;6:28 [PMID: 31240104]
  26. Nat Genet. 2011 May;43(5):491-8 [PMID: 21478889]
  27. Am J Hum Genet. 2009 Feb;84(2):210-23 [PMID: 19200528]
  28. Dtsch Med Wochenschr. 2004 Apr 30;129 Suppl 1:S25-8 [PMID: 15133739]
  29. Am J Hum Genet. 2019 Jan 3;104(1):13-20 [PMID: 30609404]
  30. Hum Mutat. 2019 Oct;40(10):1664-1675 [PMID: 31180159]
  31. BMC Bioinformatics. 2014 Nov 25;15:356 [PMID: 25420514]
  32. Am J Hum Genet. 2007 Sep;81(3):559-75 [PMID: 17701901]
  33. Nat Commun. 2016 Oct 06;7:12989 [PMID: 27708267]
  34. PLoS One. 2013 Nov 18;8(11):e79667 [PMID: 24260275]
  35. Bioinformatics. 2009 Aug 15;25(16):2078-9 [PMID: 19505943]
  36. Nat Genet. 2012 May 20;44(6):631-5 [PMID: 22610117]
  37. Ann Oncol. 2018 Apr 1;29(4):783-784 [PMID: 29360919]
  38. Proc Natl Acad Sci U S A. 2016 Oct 18;113(42):11901-11906 [PMID: 27702888]
  39. Nature. 2003 Dec 18;426(6968):789-96 [PMID: 14685227]
  40. Eur J Hum Genet. 2014 Feb;22(2):221-7 [PMID: 23714750]
  41. N Engl J Med. 2019 Aug 15;381(7):668-676 [PMID: 31412182]
  42. Nature. 2021 Feb;590(7845):290-299 [PMID: 33568819]
  43. Ultrasound Obstet Gynecol. 2015 May;45(5):530-8 [PMID: 25598039]
  44. Nature. 2017 Aug 3;548(7665):87-91 [PMID: 28746312]

MeSH Term

Humans
Gene Frequency
Mutation
Databases, Genetic
China
East Asian People
Genetic Variation
Genetics, Population

Word Cloud

Created with Highcharts 10.0.0databasepopulationChineseCMDBinformationgenomedatavariationderivedvariantsallelefrequencyWGSindividualsChinacomprehensivecanvariantincludingphenotypeshigh-qualitylarge-scaleoneimportantinfrastructuresgenomicsclinicaltranslationalmedicineresearchdevelopedMillionomeDatabasecontains904millionsinglenucleotideSNVlow-coverage006×-0whole-genomesequencing141431unrelatedhealthyrecruited3134administrativedivisionscoveringHan36ethnicminoritieshousingmulti-ethnicfeaturingwidegeographicaldistributionbecomerepresentativedateResearchersquicklysearchgenegenomicregionsobtainmutationbasicgenicannotationoverviewfrequenciesglobalpopulationsFurthermorealsoprovidesassociationrangeheightBMImaternalagetwinpregnancyBasedresearchersconductmeta-analysisrelatedfreelyavailablehttps://dbcngborg/cmdb/CMDB:

Similar Articles

Cited By