Assessing numerical dependence in gene expression summaries with the jackknife expression difference.

John R Stevens, Gabriel Nicholas
Author Information
  1. John R Stevens: Department of Mathematics and Statistics, Center for Integrated Biosystems, Utah State University, Logan, Utah, United States of America. john.r.stevens@usu.edu

Abstract

Statistical methods to test for differential expression traditionally assume that each gene's expression summaries are independent across arrays. When certain preprocessing methods are used to obtain those summaries, this assumption is not necessarily true. In general, the erroneous assumption of dependence results in a loss of statistical power. We introduce a diagnostic measure of numerical dependence for gene expression summaries from any preprocessing method and discuss the relative performance of several common preprocessing methods with respect to this measure. Some common preprocessing methods introduce non-trivial levels of numerical dependence. The issue of (between-array) dependence has received little if any attention in the literature, and researchers working with gene expression data should not take such properties for granted, or they risk unnecessarily losing statistical power.

References

  1. BMC Bioinformatics. 2010 May 26;11:281 [PMID: 20504334]
  2. Bioinformatics. 2006 Apr 1;22(7):789-94 [PMID: 16410320]
  3. Nucleic Acids Res. 2002 Jan 1;30(1):207-10 [PMID: 11752295]
  4. Genetics. 2010 Jun;185(2):405-16 [PMID: 20439781]
  5. BMC Genomics. 2008 Aug 07;9:376 [PMID: 18687144]
  6. BMC Bioinformatics. 2010 May 27;11:285 [PMID: 20507584]
  7. Nucleic Acids Res. 2011 Jan;39(Database issue):D1005-10 [PMID: 21097893]
  8. Genome Biol. 2004;5(10):R80 [PMID: 15461798]
  9. Bioinformatics. 2004 Feb 12;20(3):323-31 [PMID: 14960458]
  10. Genome Biol. 2001;2(8):RESEARCH0032 [PMID: 11532216]
  11. Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21 [PMID: 11309499]
  12. Proc Natl Acad Sci U S A. 2001 Jan 2;98(1):31-6 [PMID: 11134512]
  13. Bioinformatics. 2005 Sep 15;21(18):3637-44 [PMID: 16020470]
  14. Bioinformatics. 2006 Sep 1;22(17):2107-13 [PMID: 16820429]
  15. Nucleic Acids Res. 2003 Feb 15;31(4):e15 [PMID: 12582260]
  16. BMC Bioinformatics. 2007 Mar 21;8:98 [PMID: 17376221]
  17. Cancer Inform. 2008;6:423-31 [PMID: 19259420]
  18. Stat Appl Genet Mol Biol. 2005;4:Article34 [PMID: 16646853]
  19. Genome Biol. 2005;6(2):R16 [PMID: 15693945]
  20. Nat Biotechnol. 1996 Dec;14(13):1675-80 [PMID: 9634850]
  21. Bioinformatics. 2007 Sep 1;23(17):2298-305 [PMID: 17586543]
  22. Stat Appl Genet Mol Biol. 2004;3:Article3 [PMID: 16646809]

MeSH Term

Algorithms
Data Interpretation, Statistical
Data Mining
Gene Expression Profiling
Gene Expression Regulation
Oligonucleotide Array Sequence Analysis

Word Cloud

Created with Highcharts 10.0.0expressiondependencemethodssummariespreprocessingnumericalgeneassumptionstatisticalpowerintroducemeasurecommonStatisticaltestdifferentialtraditionallyassumegene'sindependentacrossarrayscertainusedobtainnecessarilytruegeneralerroneousresultslossdiagnosticmethoddiscussrelativeperformanceseveralrespectnon-triviallevelsissuebetween-arrayreceivedlittleattentionliteratureresearchersworkingdatatakepropertiesgrantedriskunnecessarilylosingAssessingjackknifedifference

Similar Articles

Cited By