Sample size for detecting differentially expressed genes in microarray experiments.

Caimiao Wei, Jiangning Li, Roger E Bumgarner
Author Information
  1. Caimiao Wei: Department of Microbiology, University of Washington, Seattle, WA 98195, USA. caimiaow@u.washington.edu

Abstract

BACKGROUND: Microarray experiments are often performed with a small number of biological replicates, resulting in low statistical power for detecting differentially expressed genes and concomitant high false positive rates. While increasing sample size can increase statistical power and decrease error rates, with too many samples, valuable resources are not used efficiently. The issue of how many replicates are required in a typical experimental system needs to be addressed. Of particular interest is the difference in required sample sizes for similar experiments in inbred vs. outbred populations (e.g. mouse and rat vs. human).
RESULTS: We hypothesize that if all other factors (assay protocol, microarray platform, data pre-processing) were equal, fewer individuals would be needed for the same statistical power using inbred animals as opposed to unrelated human subjects, as genetic effects on gene expression will be removed in the inbred populations. We apply the same normalization algorithm and estimate the variance of gene expression for a variety of cDNA data sets (humans, inbred mice and rats) comparing two conditions. Using one sample, paired sample or two independent sample t-tests, we calculate the sample sizes required to detect a 1.5-, 2-, and 4-fold changes in expression level as a function of false positive rate, power and percentage of genes that have a standard deviation below a given percentile.
CONCLUSIONS: Factors that affect power and sample size calculations include variability of the population, the desired detectable differences, the power to detect the differences, and an acceptable error rate. In addition, experimental design, technical variability and data pre-processing play a role in the power of the statistical tests in microarrays. We show that the number of samples required for detecting a 2-fold change with 90% probability and a p-value of 0.01 in humans is much larger than the number of samples commonly used in present day studies, and that far fewer individuals are needed for the same statistical power when using inbred animals rather than unrelated human subjects.

References

  1. Stat Med. 2002 Dec 15;21(23):3543-70 [PMID: 12436455]
  2. Nat Rev Genet. 2002 Aug;3(8):579-88 [PMID: 12154381]
  3. Genet Res. 2001 Apr;77(2):123-8 [PMID: 11355567]
  4. Proc Natl Acad Sci U S A. 2004 Jan 20;101(3):811-6 [PMID: 14711987]
  5. Mol Biol Cell. 2004 Jun;15(6):2523-36 [PMID: 15034139]
  6. Bioinformatics. 2003 Sep 1;19(13):1620-7 [PMID: 12967957]
  7. Science. 1995 Oct 20;270(5235):467-70 [PMID: 7569999]
  8. Genome Biol. 2002;3(5):research0022 [PMID: 12049663]
  9. Genome Res. 2000 Dec;10(12):2022-9 [PMID: 11116096]
  10. Cancer Res. 2003 Feb 15;63(4):859-64 [PMID: 12591738]
  11. Methods Mol Biol. 2003;224:137-47 [PMID: 12710671]
  12. Proc Natl Acad Sci U S A. 2001 Nov 6;98(23):13266-71 [PMID: 11698685]
  13. Nucleic Acids Res. 2002 Feb 15;30(4):e15 [PMID: 11842121]
  14. J Comput Biol. 2003;10(3-4):653-67 [PMID: 12935350]
  15. Stat Appl Genet Mol Biol. 2003;2:Article4 [PMID: 16646782]
  16. Mol Biol Cell. 2002 Jun;13(6):1929-39 [PMID: 12058060]

Grants

  1. 1P50HL07399/NHLBI NIH HHS
  2. 1U19ES011387/NIEHS NIH HHS
  3. 5U24DK058813/NIDDK NIH HHS
  4. P30 DA015625/NIDA NIH HHS
  5. U19 ES011387/NIEHS NIH HHS
  6. 1R21AI052028/NIAID NIH HHS
  7. P01 AI052106/NIAID NIH HHS
  8. 5P30DA015625/NIDA NIH HHS
  9. 5R01HL072370/NHLBI NIH HHS
  10. 5P01AI052106/NIAID NIH HHS
  11. R01 HL072370/NHLBI NIH HHS

MeSH Term

Animals
Biomarkers, Tumor
Carcinoma, Hepatocellular
Gene Expression Profiling
Genes
Hepacivirus
Humans
Liver
Liver Neoplasms
Mice
Mice, Inbred Strains
Models, Statistical
Oligonucleotide Array Sequence Analysis
RNA
RNA, Neoplasm
Rats
Rats, Inbred Strains
Sample Size

Chemicals

Biomarkers, Tumor
RNA, Neoplasm
RNA

Word Cloud

Created with Highcharts 10.0.0powersamplestatisticalinbredrequiredexperimentsnumberdetectinggenessizesampleshumandataexpressionreplicatesdifferentiallyexpressedfalsepositiverateserrormanyusedexperimentalsizesvspopulationsmicroarraypre-processingfewerindividualsneededusinganimalsunrelatedsubjectsgenehumanstwodetectratevariabilitydifferencesBACKGROUND:MicroarrayoftenperformedsmallbiologicalresultinglowconcomitanthighincreasingcanincreasedecreasevaluableresourcesefficientlyissuetypicalsystemneedsaddressedparticularinterestdifferencesimilaroutbredegmouseratRESULTS:hypothesizefactorsassayprotocolplatformequalopposedgeneticeffectswillremovedapplynormalizationalgorithmestimatevariancevarietycDNAsetsmiceratscomparingconditionsUsingonepairedindependentt-testscalculate15-2-4-foldchangeslevelfunctionpercentagestandarddeviationgivenpercentileCONCLUSIONS:Factorsaffectcalculationsincludepopulationdesireddetectableacceptableadditiondesigntechnicalplayroletestsmicroarraysshow2-foldchange90%probabilityp-value001muchlargercommonlypresentdaystudiesfarratherSample

Similar Articles

Cited By (55)