Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing.

Martin J Zhang, Fei Xia, James Zou
Author Information
  1. Martin J Zhang: Department of Electrical Engineering, Stanford University, Palo Alto, 94304, USA. ORCID
  2. Fei Xia: Department of Electrical Engineering, Stanford University, Palo Alto, 94304, USA. ORCID
  3. James Zou: Department of Electrical Engineering, Stanford University, Palo Alto, 94304, USA. jamesz@stanford.edu. ORCID

Abstract

Multiple hypothesis testing is an essential component of modern data science. In many settings, in addition to the p-value, additional covariates for each hypothesis are available, e.g., functional annotation of variants in genome-wide association studies. Such information is ignored by popular multiple testing approaches such as the Benjamini-Hochberg procedure (BH). Here we introduce AdaFDR, a fast and flexible method that adaptively learns the optimal p-value threshold from covariates to significantly improve detection power. On eQTL analysis of the GTEx data, AdaFDR discovers 32% more associations than BH at the same false discovery rate. We prove that AdaFDR controls false discovery proportion and show that it makes substantially more discoveries while controlling false discovery rate (FDR) in extensive experiments. AdaFDR is computationally efficient and allows multi-dimensional covariates with both numeric and categorical values, making it broadly useful across many applications.

References

  1. J Am Stat Assoc. 2015;110(510):459-471 [PMID: 26855459]
  2. Genome Res. 2011 Feb;21(2):193-202 [PMID: 20921232]
  3. Sci Signal. 2012 Mar 27;5(217):rs2 [PMID: 22457332]
  4. Genome Biol. 2012 Jan 31;13(1):R7 [PMID: 22293038]
  5. PLoS Genet. 2017 Mar 9;13(3):e1006646 [PMID: 28278150]
  6. Nat Methods. 2016 Jul;13(7):577-80 [PMID: 27240256]
  7. Bioinformatics. 2017 Sep 15;33(18):2873-2881 [PMID: 28505251]
  8. Genome Biol. 2019 Jun 4;20(1):118 [PMID: 31164141]
  9. Nature. 2017 Oct 11;550(7675):204-213 [PMID: 29022597]
  10. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50 [PMID: 16199517]
  11. Nature. 2013 Sep 26;501(7468):506-11 [PMID: 24037378]
  12. J Am Stat Assoc. 2010 Sep 1;105(491):1215-1227 [PMID: 21931466]
  13. PeerJ. 2018 Dec 10;6:e6035 [PMID: 30581661]
  14. Science. 2015 May 8;348(6235):648-60 [PMID: 25954001]
  15. Biometrika. 2015 Dec;102(4):753-766 [PMID: 27046938]
  16. PLoS One. 2011 Mar 24;6(3):e17820 [PMID: 21455293]
  17. Nat Genet. 2012 Oct;44(10):1084-9 [PMID: 22941192]
  18. Nat Biotechnol. 2010 Oct;28(10):1045-8 [PMID: 20944595]
  19. mBio. 2015 May 12;6(3):e00326-15 [PMID: 25968645]
  20. Bioinformatics. 2014 Aug 1;30(15):2098-104 [PMID: 24711653]
  21. PLoS One. 2014 Jun 13;9(6):e99625 [PMID: 24926665]
  22. Stat Sci. 2009 Nov;24(4):398-413 [PMID: 20711421]
  23. J Stat Softw. 2014;59(13):1-21 [PMID: 26917999]
  24. PLoS One. 2016 Feb 25;11(2):e0149016 [PMID: 26914144]

Grants

  1. P30 AG059307/NIA NIH HHS

MeSH Term

Algorithms
Data Interpretation, Statistical
Genome-Wide Association Study
Humans
Magnetic Resonance Imaging
Microbiota
Polymorphism, Single Nucleotide
Proteomics
Quantitative Trait Loci
Research Design
Sequence Analysis, RNA

Word Cloud

Created with Highcharts 10.0.0AdaFDRtestingcovariatesfalsediscoveryhypothesisdatamanyp-valueBHmethoddetectionpowerrateMultipleessentialcomponentmodernsciencesettingsadditionadditionalavailableegfunctionalannotationvariantsgenome-wideassociationstudiesinformationignoredpopularmultipleapproachesBenjamini-HochbergprocedureintroducefastflexibleadaptivelylearnsoptimalthresholdsignificantlyimproveeQTLanalysisGTExdiscovers32%associationsprovecontrolsproportionshowmakessubstantiallydiscoveriescontrollingFDRextensiveexperimentscomputationallyefficientallowsmulti-dimensionalnumericcategoricalvaluesmakingbroadlyusefulacrossapplicationsFastcovariate-adaptiveamplifieslarge-scalemultiple hypothesis

Similar Articles

Cited By