Genetic structure of the Han Chinese population revealed by genome-wide SNP variation.

Jieming Chen, Houfeng Zheng, Jin-Xin Bei, Liangdan Sun, Wei-hua Jia, Tao Li, Furen Zhang, Mark Seielstad, Yi-Xin Zeng, Xuejun Zhang, Jianjun Liu
Author Information
  1. Jieming Chen: Human Genetics, Genome Institute of Singapore, Singapore 138672, Singapore.

Abstract

Population stratification is a potential problem for genome-wide association studies (GWAS), confounding results and causing spurious associations. Hence, understanding how allele frequencies vary across geographic regions or among subpopulations is an important prelude to analyzing GWAS data. Using over 350,000 genome-wide autosomal SNPs in over 6000 Han Chinese samples from ten provinces of China, our study revealed a one-dimensional "north-south" population structure and a close correlation between geography and the genetic structure of the Han Chinese. The north-south population structure is consistent with the historical migration pattern of the Han Chinese population. Metropolitan cities in China were, however, more diffused "outliers," probably because of the impact of modern migration of peoples. At a very local scale within the Guangdong province, we observed evidence of population structure among dialect groups, probably on account of endogamy within these dialects. Via simulation, we show that empirical levels of population structure observed across modern China can cause spurious associations in GWAS if not properly handled. In the Han Chinese, geographic matching is a good proxy for genetic matching, particularly in validation and candidate-gene studies in which population stratification cannot be directly accessed and accounted for because of the lack of genome-wide data, with the exception of the metropolitan cities, where geographical location is no longer a good indicator of ancestral origin. Our findings are important for designing GWAS in the Chinese population, an activity that is expected to intensify greatly in the near future.

References

  1. Proc Natl Acad Sci U S A. 1998 Sep 29;95(20):11763-8 [PMID: 9751739]
  2. Nature. 2005 Oct 27;437(7063):1299-320 [PMID: 16255080]
  3. Nature. 2008 Nov 6;456(7218):98-101 [PMID: 18758442]
  4. Proc Natl Acad Sci U S A. 2000 Dec 5;97(25):14003-6 [PMID: 11095712]
  5. Nature. 1998 Oct 15;395(6703):636-7, 639 [PMID: 9790181]
  6. Am J Hum Genet. 2007 Sep;81(3):559-75 [PMID: 17701901]
  7. Am J Hum Genet. 2002 Mar;70(3):635-51 [PMID: 11836649]
  8. Am J Hum Genet. 2008 Dec;83(6):787-94 [PMID: 19061986]
  9. Am J Hum Genet. 1999 Dec;65(6):1718-24 [PMID: 10577926]
  10. Mol Biol Evol. 2004 Dec;21(12):2265-80 [PMID: 15317881]
  11. Nat Genet. 2009 Feb;41(2):205-10 [PMID: 19169255]
  12. Biometrics. 1999 Dec;55(4):997-1004 [PMID: 11315092]
  13. Nature. 2007 Jun 7;447(7145):661-78 [PMID: 17554300]
  14. Ann Eugen. 1951 Mar;15(4):323-54 [PMID: 24540312]
  15. Nat Genet. 2006 Aug;38(8):904-9 [PMID: 16862161]
  16. Nat Rev Genet. 2000 Nov;1(2):126-33 [PMID: 11253652]
  17. Am J Hum Genet. 2009 Mar;84(3):418-23 [PMID: 19268274]
  18. Genetics. 2000 Jun;155(2):945-59 [PMID: 10835412]
  19. Mol Biol Evol. 2003 Feb;20(2):214-9 [PMID: 12598688]
  20. Nature. 2008 Feb 21;451(7181):998-1003 [PMID: 18288195]
  21. Nature. 2004 Sep 16;431(7006):302-5 [PMID: 15372031]
  22. Genet Epidemiol. 2005 May;28(4):289-301 [PMID: 15712363]
  23. PLoS Genet. 2008 Jan;4(1):e4 [PMID: 18208329]
  24. Bioinformatics. 2007 Jul 15;23(14):1801-6 [PMID: 17485429]
  25. Genetics. 2003 Aug;164(4):1567-87 [PMID: 12930761]

MeSH Term

Algorithms
Asian People
China
Computer Simulation
Ethnicity
Genetic Variation
Genetics, Population
Genome
Genome-Wide Association Study
Humans
Models, Genetic
Polymorphism, Single Nucleotide

Word Cloud

Created with Highcharts 10.0.0populationChinesestructureHangenome-wideGWASChinastratificationstudiesspuriousassociationsacrossgeographicamongimportantdatarevealedgeneticmigrationcitiesprobablymodernwithinobservedmatchinggoodPopulationpotentialproblemassociationconfoundingresultscausingHenceunderstandingallelefrequenciesvaryregionssubpopulationspreludeanalyzingUsing350000autosomalSNPs6000samplestenprovincesstudyone-dimensional"north-south"closecorrelationgeographynorth-southconsistenthistoricalpatternMetropolitanhoweverdiffused"outliers"impactpeopleslocalscaleGuangdongprovinceevidencedialectgroupsaccountendogamydialectsViasimulationshowempiricallevelscancauseproperlyhandledproxyparticularlyvalidationcandidate-genedirectlyaccessedaccountedlackexceptionmetropolitangeographicallocationlongerindicatorancestraloriginfindingsdesigningactivityexpectedintensifygreatlynearfutureGeneticSNPvariation

Similar Articles

Cited By