Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.

Laura L Faye, Mitchell J Machiela, Peter Kraft, Shelley B Bull, Lei Sun
Author Information
  1. Laura L Faye: Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.

Abstract

Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.

References

  1. Genet Epidemiol. 2005 May;28(4):352-67 [PMID: 15761913]
  2. Stat Med. 2011 Jul 10;30(15):1898-912 [PMID: 21538984]
  3. Mol Biol Evol. 2008 Jan;25(1):199-206 [PMID: 17981928]
  4. Nat Genet. 2011 Mar 06;43(4):316-20 [PMID: 21378987]
  5. Genet Epidemiol. 2009 Jul;33(5):406-18 [PMID: 19140132]
  6. Nat Genet. 2011 Mar;43(3):180-1 [PMID: 21350497]
  7. Genet Epidemiol. 2011 May;35(4):261-8 [PMID: 21328616]
  8. Nat Genet. 2011 Dec 25;44(2):183-6 [PMID: 22197933]
  9. Hum Mol Genet. 2011 Jul 15;20(14):2869-78 [PMID: 21531787]
  10. Prostate. 2013 May;73(7):677-89 [PMID: 23255287]
  11. Hum Mol Genet. 2011 Oct 1;20(19):3867-75 [PMID: 21743057]
  12. Nat Genet. 2007 Jul;39(7):906-13 [PMID: 17572673]
  13. Genet Epidemiol. 2010 Jul;34(5):479-91 [PMID: 20552648]
  14. Am J Hum Genet. 2008 Feb;82(2):444-52 [PMID: 18252224]
  15. Hum Genet. 2011 May;129(5):545-52 [PMID: 21246217]
  16. Genet Epidemiol. 2009 Jul;33(5):453-62 [PMID: 19140131]
  17. Nature. 2010 Oct 28;467(7319):1061-73 [PMID: 20981092]
  18. Hum Genet. 2012 Jan;131(1):111-9 [PMID: 21735171]
  19. Am J Hum Genet. 2009 Nov;85(5):692-8 [PMID: 19853241]
  20. Genet Epidemiol. 2007 May;31(4):288-95 [PMID: 17266119]
  21. Genet Epidemiol. 2010 Sep;34(6):537-42 [PMID: 20717975]
  22. Nat Rev Genet. 2011 Jun;12(6):443-51 [PMID: 21587300]
  23. Genome Res. 2011 Jul;21(7):1099-108 [PMID: 21521787]
  24. Genet Epidemiol. 2011 May;35(4):269-77 [PMID: 21370254]
  25. Nat Genet. 2011 Mar 29;43(4):287-8 [PMID: 21445070]
  26. Genet Epidemiol. 2010 Dec;34(8):816-34 [PMID: 21058334]
  27. BMC Proc. 2011 Nov 29;5 Suppl 9:S64 [PMID: 22373407]
  28. Genome Res. 2008 Dec;18(12):2024-33 [PMID: 18818371]
  29. Biostatistics. 2008 Oct;9(4):621-34 [PMID: 18310059]
  30. Genome Res. 2011 Jun;21(6):940-51 [PMID: 21460063]
  31. Genet Epidemiol. 2008 Apr;32(3):204-14 [PMID: 18064636]
  32. PLoS Genet. 2008 Dec;4(12):e1000279 [PMID: 19057666]
  33. Am J Hum Genet. 2009 Dec;85(6):847-61 [PMID: 19931040]
  34. Am J Hum Genet. 2009 Feb;84(2):210-23 [PMID: 19200528]
  35. Nat Rev Genet. 2009 May;10(5):318-29 [PMID: 19373277]
  36. Genome Biol. 2009;10(3):R32 [PMID: 19327155]
  37. Nature. 2007 Jun 7;447(7145):661-78 [PMID: 17554300]
  38. Am J Hum Genet. 2010 Jan;86(1):23-33 [PMID: 20085711]
  39. Nat Genet. 2008 May;40(5):631-7 [PMID: 18372901]
  40. Biometrics. 2013 Jun;69(2):427-35 [PMID: 23441822]
  41. Genet Epidemiol. 2012 Jan;36(1):22-35 [PMID: 22147673]
  42. Am J Hum Genet. 2001 Dec;69(6):1357-69 [PMID: 11593451]
  43. Nat Genet. 2008 Sep;40(9):1053-5 [PMID: 18677311]
  44. Am J Hum Genet. 2007 Apr;80(4):605-15 [PMID: 17357068]
  45. Am J Hum Genet. 2008 May;82(5):1064-74 [PMID: 18423522]
  46. Hum Hered. 2004;58(3-4):154-63 [PMID: 15812172]
  47. Genome Res. 2010 Mar;20(3):291-300 [PMID: 20067940]
  48. Nat Genet. 2012 May 20;44(6):631-5 [PMID: 22610117]
  49. Genet Epidemiol. 2010 Jul;34(5):463-8 [PMID: 20583289]
  50. Nat Rev Genet. 2011 Aug 18;12(9):628-40 [PMID: 21850043]

Grants

  1. MOP-84287/CIHR
  2. 84287-1/Canadian Institutes of Health Research
  3. U01 CA098216/NCI NIH HHS
  4. T32 GM074897/NIGMS NIH HHS
  5. U01 CA098233/NCI NIH HHS
  6. MDR-88001/CIHR
  7. /Wellcome Trust
  8. U01 CA098710/NCI NIH HHS
  9. U01-CA98233/NCI NIH HHS
  10. U01-CA98710/NCI NIH HHS
  11. U01-CA98216/NCI NIH HHS
  12. 076113/Wellcome Trust
  13. GET-101831/CIHR
  14. T32-GM074897/NIGMS NIH HHS
  15. 84287-2/Canadian Institutes of Health Research

MeSH Term

Breast Neoplasms
Female
Genome-Wide Association Study
Genotype
High-Throughput Nucleotide Sequencing
Humans
Male
Models, Theoretical
Polymorphism, Single Nucleotide
Prostatic Neoplasms
Sample Size

Word Cloud

Created with Highcharts 10.0.0causalpowerSNPcanlocalizevariantsGWASSNPssequencingsampledetectidentificationassociationusinglargesizesrequiredregioncertainfactorsincreaseassociateddecreasevariantimputationhighgenotypingbiasevidencetowardfine-mappingfindingstageffectslociNextgenerationdramaticallyincreasedabilitydisease-causingprovidingbase-pairlevelinformationcostsincreasinglyfeasiblecomplex-traitassociationsYetwithinestablishedremainschallengeCounter-intuitivelyFirstcombininglowcoverageachieveunintendedeffectproducingdifferentialerroramongtendsrelativebettergenotypedSecondre-usedataexploitspreviousensuregenome-widesignificanceGWAS-associatedregionsHoweverinformanalysisawayLDTogetherreducehalfstrategiescommonlyemployednamelyincreasingsizehigherdensityarrayscommonscenariosactuallyexacerbatedevelopre-rankingprocedureaccountsadversesubstantiallyimprovesaccuracyoftendoublingprobabilitytop-rankedApplicationNCIBPC3aggressiveprostatecancermeta-analysisidentifiednewtop23severaladditionalpossiblemayotherwiseoverlookedmethodsimpleimplementRscriptsprovidedauthor'swebsiteRe-rankingpost-GWASeraaccurate

Similar Articles

Cited By