VCGDB: a dynamic genome database of the Chinese population.

Yunchao Ling, Zhong Jin, Mingming Su, Jun Zhong, Yongbing Zhao, Jun Yu, Jiayan Wu, Jingfa Xiao
Author Information
  1. Jun Yu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. junyu@big.ac.cn.

Abstract

BACKGROUND: The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies.
DESCRIPTION: We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software.
CONCLUSIONS: VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases.

References

  1. Bioinformatics. 2009 Jul 15;25(14):1754-60 [PMID: 19451168]
  2. BMC Genomics. 2008 Oct 16;9:488 [PMID: 18925949]
  3. Sci Rep. 2011;1:46 [PMID: 22355565]
  4. Nat Genet. 2010 Nov;42(11):985-90 [PMID: 20953190]
  5. Nature. 2008 Nov 6;456(7218):60-5 [PMID: 18987735]
  6. Nat Rev Genet. 2010 Sep;11(9):647-57 [PMID: 20717155]
  7. Brief Bioinform. 2010 Jan;11(1):3-14 [PMID: 20053733]
  8. Electrophoresis. 2012 Dec;33(23):3418-28 [PMID: 23138639]
  9. J Intern Med. 2012 Feb;271(2):122-30 [PMID: 22142244]
  10. Biol Direct. 2012 Nov 28;7:43; discussion 43 [PMID: 23190475]
  11. Nature. 2012 Nov 1;491(7422):56-65 [PMID: 23128226]
  12. Nature. 2010 Apr 1;464(7289):670-1 [PMID: 20360711]
  13. Genome Biol. 2009;10(3):R25 [PMID: 19261174]
  14. Nature. 2010 Oct 28;467(7319):1061-73 [PMID: 20981092]
  15. J Gen Intern Med. 2013 Sep;28 Suppl 3:S660-5 [PMID: 23797912]
  16. Genome Biol. 2011 Sep 14;12(9):R84 [PMID: 21917140]
  17. Nature. 2011 Jun 29;474(7353):609-15 [PMID: 21720365]
  18. IEEE Trans Nanotechnol. 2010 May 1;9(3):281-294 [PMID: 21572978]
  19. Genome Biol. 2010;11(9):R91 [PMID: 20822512]
  20. Bioinformatics. 2009 Aug 1;25(15):1966-7 [PMID: 19497933]
  21. PLoS One. 2012;7(7):e40294 [PMID: 22811759]
  22. Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7 [PMID: 19474294]
  23. PLoS Biol. 2007 Sep 4;5(10):e254 [PMID: 17803354]
  24. Nucleic Acids Res. 2003 Jan 1;31(1):34-7 [PMID: 12519942]
  25. Nucleic Acids Res. 2010 Sep;38(16):e164 [PMID: 20601685]
  26. JAMA. 2013 Apr 3;309(13):1351-2 [PMID: 23549579]
  27. BMC Bioinformatics. 2009 Aug 25;10:266 [PMID: 19706180]
  28. Inflamm Bowel Dis. 2004 Sep;10(5):646-51 [PMID: 15472528]
  29. Nat Genet. 2012 Feb 05;44(3):243-6 [PMID: 22306651]
  30. Nucleic Acids Res. 2011 Jan;39(Database issue):D945-50 [PMID: 20952405]
  31. Nat Biotechnol. 2010 Jan;28(1):57-63 [PMID: 19997067]
  32. Nature. 2009 Aug 20;460(7258):1011-5 [PMID: 19587683]
  33. Nat Biotechnol. 2010 Jul;28(7):691-3 [PMID: 20622843]
  34. Genome Res. 2002 Jun;12(6):996-1006 [PMID: 12045153]
  35. Nature. 2008 Nov 6;456(7218):53-9 [PMID: 18987734]
  36. Diabetes. 2010 Oct;59(10):2400-6 [PMID: 20622161]
  37. Cell Stress Chaperones. 2008 Summer;13(2):231-8 [PMID: 18320357]
  38. Rev Med Chil. 2008 May;136(5):645-52 [PMID: 18769814]
  39. Nature. 2004 Oct 21;431(7011):931-45 [PMID: 15496913]
  40. N Engl J Med. 2010 Apr 1;362(13):1235-6 [PMID: 20220178]
  41. Bioinformatics. 2009 Aug 15;25(16):2078-9 [PMID: 19505943]
  42. Science. 2012 Jul 6;337(6090):64-9 [PMID: 22604720]
  43. Nat Biotechnol. 2012 Jun 07;30(6):509-11 [PMID: 22678394]
  44. PLoS Comput Biol. 2011 Aug;7(8):e1002147 [PMID: 21901085]
  45. Nat Rev Genet. 2011 Mar;12(3):224 [PMID: 21301474]
  46. Am J Med Genet A. 2011 Dec;155A(12):2916-24 [PMID: 22038764]
  47. Clin Chem. 2012 Dec;58(12):1720-2 [PMID: 23071366]

MeSH Term

Asian People
China
Chromosome Mapping
Computational Biology
Databases, Nucleic Acid
Genetics, Population
Genome, Human
Genome-Wide Association Study
Genomics
Humans
Polymorphism, Single Nucleotide
Search Engine
Web Browser

Links to CNCB-NGDC Resources

Database Commons: DBC000161 (VCGDB)

Word Cloud

Created with Highcharts 10.0.0genomedataChinesestudiesdynamicVCGDBvariationssequencingmillion1000GenomesProjectnumberhumansinglereferencebasedgenomicsdatabasepopulationprovidesgenomicinformationBACKGROUND:releasedcontainincreasingsequencesdifferentnationspopulationslargegeneticresultfocuschangingstaticcomplexcurrentlyavailableGRCh3713anonymousCaucasianvolunteersmightlimitscopetranscriptomicsepigeneticswideassociationDESCRIPTION:usedmassiveamountpublishedConsortiumconstructVirtualGenomeDatabasewhole194individualscontains35nucleotideSNVs05insertions/deletionsindels29raretogetherannotationalsohighlyinteractiveuser-friendlyvirtualbrowserVCGBrowserfunctionslikeseamlesszoomingreal-timesearchingadditionestablishedthreepopulation-specificconsensusgenomescompatiblemainstreamalignmentsoftwareCONCLUSIONS:offersfeasiblestrategyprocessingbigkeeppacebiologicalexplosionprovidingrobustresourceparticularaimedfindingregionsassociateddiseasesVCGDB:

Similar Articles

Cited By