Accounting for population stratification in practice: a comparison of the main strategies dedicated to genome-wide association studies.

Matthieu Bouaziz, Christophe Ambroise, Mickael Guedj
Author Information
  1. Matthieu Bouaziz: Department of Biostatistics, Pharnext, Paris, France. matthieu.x.bouaziz@gmail.com

Abstract

Genome-Wide Association Studies are powerful tools to detect genetic variants associated with diseases. Their results have, however, been questioned, in part because of the bias induced by population stratification. This is a consequence of systematic differences in allele frequencies due to the difference in sample ancestries that can lead to both false positive or false negative findings. Many strategies are available to account for stratification but their performances differ, for instance according to the type of population structure, the disease susceptibility locus minor allele frequency, the degree of sampling imbalanced, or the sample size. We focus on the type of population structure and propose a comparison of the most commonly used methods to deal with stratification that are the Genomic Control, Principal Component based methods such as implemented in Eigenstrat, adjusted Regressions and Meta-Analyses strategies. Our assessment of the methods is based on a large simulation study, involving several scenarios corresponding to many types of population structures. We focused on both false positive rate and power to determine which methods perform the best. Our analysis showed that if there is no population structure, none of the tests led to a bias nor decreased the power except for the Meta-Analyses. When the population is stratified, adjusted Logistic Regressions and Eigenstrat are the best solutions to account for stratification even though only the Logistic Regressions are able to constantly maintain correct false positive rates. This study provides more details about these methods. Their advantages and limitations in different stratification scenarios are highlighted in order to propose practical guidelines to account for population stratification in Genome-Wide Association Studies.

References

  1. BMC Bioinformatics. 2008 Sep 08;9:364 [PMID: 18778480]
  2. Bioinformatics. 2008 Jan 1;24(1):140-2 [PMID: 18006546]
  3. PLoS One. 2008;3(10):e3392 [PMID: 18852890]
  4. Ann Hum Genet. 2003 May;67(Pt 3):250-64 [PMID: 12914577]
  5. Genet Epidemiol. 2009 Sep;33(6):508-17 [PMID: 19170134]
  6. Nat Genet. 2001 Nov;29(3):306-9 [PMID: 11600885]
  7. Nat Rev Genet. 2010 Jul;11(7):459-63 [PMID: 20548291]
  8. Genet Epidemiol. 2008 Apr;32(3):215-26 [PMID: 18161052]
  9. Cancer Epidemiol Biomarkers Prev. 2008 Mar;17(3):471-7 [PMID: 18349264]
  10. Nat Rev Genet. 2006 Oct;7(10):781-91 [PMID: 16983374]
  11. Genetics. 1921 Mar;6(2):111-23 [PMID: 17245958]
  12. Genet Epidemiol. 2009 Dec;33(8):679-90 [PMID: 19353632]
  13. Hum Hered. 2004;58(1):30-9 [PMID: 15604562]
  14. BMC Bioinformatics. 2010 Sep 01;11:442 [PMID: 20809983]
  15. J Evol Biol. 2005 Sep;18(5):1368-73 [PMID: 16135132]
  16. Nat Genet. 2006 Aug;38(8):904-9 [PMID: 16862161]
  17. Genetics. 2000 Jun;155(2):945-59 [PMID: 10835412]
  18. Am J Hum Genet. 2007 May;80(5):921-30 [PMID: 17436246]
  19. Genet Epidemiol. 2010 Apr;34(3):275-85 [PMID: 20088021]
  20. Bioinformatics. 2010 Mar 15;26(6):798-806 [PMID: 20097913]
  21. Hum Mol Genet. 2008 Oct 15;17(R2):R143-50 [PMID: 18852203]
  22. Genet Epidemiol. 2009 May;33(4):290-8 [PMID: 19051284]
  23. Genome Res. 2006 Feb;16(2):290-6 [PMID: 16354752]
  24. Am J Hum Genet. 2003 Oct;73(4):711-9 [PMID: 13680525]
  25. Am J Hum Genet. 2004 Feb;74(2):317-25 [PMID: 14740319]
  26. Am J Hum Genet. 2001 Feb;68(2):466-77 [PMID: 11170894]
  27. Nat Genet. 2004 Apr;36(4):388-93 [PMID: 15052270]
  28. Theor Popul Biol. 2001 Nov;60(3):227-37 [PMID: 11855957]
  29. Am J Hum Genet. 2007 Nov;81(5):895-905 [PMID: 17924333]
  30. Nat Genet. 2010 Apr;42(4):348-54 [PMID: 20208533]
  31. Lancet. 2003 Feb 15;361(9357):598-604 [PMID: 12598158]
  32. Ann Hum Genet. 2011 May;75(3):418-27 [PMID: 21281271]
  33. Am J Hum Genet. 2000 Jul;67(1):170-81 [PMID: 10827107]
  34. PLoS Genet. 2006 Dec;2(12):e190 [PMID: 17194218]
  35. BMC Proc. 2009 Dec 15;3 Suppl 7:S109 [PMID: 20017973]
  36. Am J Hum Genet. 2007 Oct;81(4):726-43 [PMID: 17846998]
  37. Biometrics. 1999 Dec;55(4):997-1004 [PMID: 11315092]
  38. BMC Bioinformatics. 2009 Jan 30;10 Suppl 1:S73 [PMID: 19208178]
  39. Genetics. 2001 Nov;159(3):1319-23 [PMID: 11729172]
  40. Nat Genet. 2004 May;36(5):512-7 [PMID: 15052271]
  41. Genet Epidemiol. 2001 Jan;20(1):4-16 [PMID: 11119293]

MeSH Term

Gene Frequency
Genome-Wide Association Study
Humans

Word Cloud

Created with Highcharts 10.0.0populationstratificationmethodsfalsepositivestrategiesaccountstructureRegressionsGenome-WideAssociationStudiesbiasallelesampletypeproposecomparisonbasedEigenstratadjustedMeta-AnalysesstudyscenariospowerbestLogisticpowerfultoolsdetectgeneticvariantsassociateddiseasesresultshoweverquestionedpartinducedconsequencesystematicdifferencesfrequenciesduedifferenceancestriescanleadnegativefindingsManyavailableperformancesdifferinstanceaccordingdiseasesusceptibilitylocusminorfrequencydegreesamplingimbalancedsizefocuscommonlyuseddealGenomicControlPrincipalComponentimplementedassessmentlargesimulationinvolvingseveralcorrespondingmanytypesstructuresfocusedratedetermineperformanalysisshowednonetestsleddecreasedexceptstratifiedsolutionseventhoughableconstantlymaintaincorrectratesprovidesdetailsadvantageslimitationsdifferenthighlightedorderpracticalguidelinesAccountingpractice:maindedicatedgenome-wideassociationstudies

Similar Articles

Cited By