Sample size reassessment for a two-stage design controlling the false discovery rate.

Sonja Zehetmayer, Alexandra C Graf, Martin Posch
Author Information

Abstract

Sample size calculations for gene expression microarray and NGS-RNA-Seq experiments are challenging because the overall power depends on unknown quantities as the proportion of true null hypotheses and the distribution of the effect sizes under the alternative. We propose a two-stage design with an adaptive interim analysis where these quantities are estimated from the interim data. The second stage sample size is chosen based on these estimates to achieve a specific overall power. The proposed procedure controls the power in all considered scenarios except for very low first stage sample sizes. The false discovery rate (FDR) is controlled despite of the data dependent choice of sample size. The two-stage design can be a useful tool to determine the sample size of high-dimensional studies if in the planning phase there is high uncertainty regarding the expected effect sizes and variability.

References

  1. Stat Appl Genet Mol Biol. 2013 Aug;12(4):449-67 [PMID: 23934609]
  2. Biometrics. 2001 Sep;57(3):886-91 [PMID: 11550941]
  3. Nucleic Acids Res. 2002 Jan 1;30(1):207-10 [PMID: 11752295]
  4. Bioinformatics. 2005 Jul 1;21(13):3017-24 [PMID: 15840707]
  5. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  6. Biometrics. 2012 Dec;68(4):1178-87 [PMID: 22551000]
  7. BMC Bioinformatics. 2005 May 16;6:120 [PMID: 15904488]
  8. Biometrics. 1994 Dec;50(4):1029-41 [PMID: 7786985]
  9. BMC Bioinformatics. 2006 Mar 02;7:106 [PMID: 16512900]
  10. Bioinformatics. 2010 Apr 15;26(8):1050-6 [PMID: 20189938]
  11. Biometrics. 1995 Dec;51(4):1315-24 [PMID: 8589224]
  12. Biometrics. 2011 Dec;67(4):1225-35 [PMID: 21627629]
  13. Bioinformatics. 2005 Jul 15;21(14):3097-104 [PMID: 15845654]
  14. Bioinformatics. 2005 Oct 1;21(19):3771-7 [PMID: 16091414]
  15. Biometrics. 2004 Sep;60(3):774-82 [PMID: 15339301]
  16. Proc Natl Acad Sci U S A. 2006 Jan 17;103(3):649-53 [PMID: 16407153]
  17. Biom J. 2014 Jul;56(4):614-30 [PMID: 24753160]
  18. Stat Med. 2003 Mar 30;22(6):953-69 [PMID: 12627412]
  19. J Am Stat Assoc. 2010 Sep 1;105(491):1042-1055 [PMID: 21052523]
  20. Stat Med. 2011 Jun 30;30(14):1637-47 [PMID: 21495058]

Grants

  1. P 23167/Austrian Science Fund FWF

MeSH Term

Algorithms
Data Interpretation, Statistical
Gene Expression Profiling
High-Throughput Nucleotide Sequencing
Oligonucleotide Array Sequence Analysis
ROC Curve
Reproducibility of Results
Sample Size
Sequence Analysis, DNA

Word Cloud

Created with Highcharts 10.0.0sizesamplepowersizestwo-stagedesignSampleoverallquantitieseffectinterimdatastagefalsediscoveryratecalculationsgeneexpressionmicroarrayNGS-RNA-SeqexperimentschallengingdependsunknownproportiontruenullhypothesesdistributionalternativeproposeadaptiveanalysisestimatedsecondchosenbasedestimatesachievespecificproposedprocedurecontrolsconsideredscenariosexceptlowfirstFDRcontrolleddespitedependentchoicecanusefultooldeterminehigh-dimensionalstudiesplanningphasehighuncertaintyregardingexpectedvariabilityreassessmentcontrolling

Similar Articles

Cited By