Exploiting identifiability and intergene correlation for improved detection of differential expression.

J R Deller, Hayder Radha, J Justin McCormick
Author Information
  1. J R Deller: Department of Electrical and Computer Engineering, Michigan State University, 2120 EB, East Lansing, MI 48824, USA.
  2. Hayder Radha: Department of Electrical and Computer Engineering, Michigan State University, 2120 EB, East Lansing, MI 48824, USA.
  3. J Justin McCormick: Department of Molecular Biology & Biochemistry, Carcinogenesis Laboratory, Michigan State University, 341 FST, East Lansing, MI 48824, USA.

Abstract

Accurate differential analysis of microarray data strongly depends on effective treatment of intergene correlation. Such dependence is ordinarily accounted for in terms of its effect on significance cutoffs. In this paper, it is shown that correlation can, in fact, be exploited to share information across tests and reorder expression differentials for increased statistical power, regardless of the threshold. Significantly improved differential analysis is the result of two simple measures: (i) adjusting test statistics to exploit information from identifiable genes (the large subset of genes represented on a microarray that can be classified a priori as nondifferential with very high confidence], but (ii) doing so in a way that accounts for linear dependencies among identifiable and nonidentifiable genes. A method is developed that builds upon the widely used two-sample t-statistic approach and uses analysis in Hilbert space to decompose the nonidentified gene vector into two components that are correlated and uncorrelated with the identified set. In the application to data derived from a widely studied prostate cancer database, the proposed method outperforms some of the most highly regarded approaches published to date. Algorithms in MATLAB and in R are available for public download.

References

  1. Bioinformatics. 2009 Nov 1;25(21):2780-6 [PMID: 19689953]
  2. Stat Appl Genet Mol Biol. 2005;4:Article34 [PMID: 16646853]
  3. Nat Rev Drug Discov. 2005 May;4(5):362-3 [PMID: 15902768]
  4. J Bioinform Comput Biol. 2006 Oct;4(5):1057-68 [PMID: 17099941]
  5. Bioinformatics. 2010 Feb 1;26(3):348-54 [PMID: 19996162]
  6. Bioinformatics. 2005 Apr 15;21(8):1538-41 [PMID: 15585528]
  7. Cancer Cell. 2002 Mar;1(2):203-9 [PMID: 12086878]
  8. Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):9834-9 [PMID: 10963655]
  9. Genomics Proteomics Bioinformatics. 2008 Dec;6(3-4):186-9 [PMID: 19329069]
  10. J Comput Biol. 2000;7(6):819-37 [PMID: 11382364]
  11. Biostatistics. 2005 Jan;6(1):59-75 [PMID: 15618528]
  12. BMC Genomics. 2008 Jan 28;9:46 [PMID: 18226214]
  13. Curr Genomics. 2009 Sep;10(6):430-45 [PMID: 20190957]
  14. J Comput Biol. 2008 Jul-Aug;15(6):625-37 [PMID: 18631025]
  15. Genomics Proteomics Bioinformatics. 2010 Sep;8(3):200-10 [PMID: 20970748]
  16. Genomics Proteomics Bioinformatics. 2008 Jun;6(2):61-73 [PMID: 18973862]
  17. Bioinformatics. 2005 Oct 15;21(20):3865-72 [PMID: 16105901]
  18. Nat Rev Genet. 2011 Mar;12 (3):215-23 [PMID: 21301473]
  19. Biostatistics. 2007 Apr;8(2):414-32 [PMID: 16928955]
  20. ISRN Bioinform. 2012 Apr 12;2012:564715 [PMID: 25937940]
  21. Nat Genet. 1999 Jan;21(1 Suppl):3-4 [PMID: 9915492]
  22. Nucleic Acids Res. 2003 May 1;31(9):e52 [PMID: 12711697]
  23. IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):929-42 [PMID: 21566252]
  24. Genome Biol. 2005;6(10):R88 [PMID: 16207359]
  25. PLoS Genet. 2007 Sep;3(9):1724-35 [PMID: 17907809]
  26. Bioinformatics. 2006 Feb 15;22(4):507-8 [PMID: 16357033]
  27. Bioinformatics. 2004 Nov 22;20(17):3146-55 [PMID: 15231528]
  28. BMC Bioinformatics. 2006 Aug 19;7:387 [PMID: 16919171]
  29. IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):723-31 [PMID: 20733236]
  30. N Engl J Med. 2001 Feb 22;344(8):539-48 [PMID: 11207349]
  31. Bioinformatics. 2007 Jul 15;23(14):1843-5 [PMID: 17485426]
  32. Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21 [PMID: 11309499]
  33. J Comput Biol. 2001;8(1):37-52 [PMID: 11339905]
  34. BMC Genomics. 2010 Nov 02;11 Suppl 2:S8 [PMID: 21047389]
  35. Nucleic Acids Res. 2003 Feb 15;31(4):e15 [PMID: 12582260]

Word Cloud

Created with Highcharts 10.0.0differentialanalysiscorrelationgenesmicroarraydataintergenecaninformationexpressionimprovedtwoidentifiablemethodwidelyAccuratestronglydependseffectivetreatmentdependenceordinarilyaccountedtermseffectsignificancecutoffspapershownfactexploitedshareacrosstestsreorderdifferentialsincreasedstatisticalpowerregardlessthresholdSignificantlyresultsimplemeasures:adjustingteststatisticsexploitlargesubsetrepresentedclassifiedpriorinondifferentialhighconfidence]iiwayaccountslineardependenciesamongnonidentifiabledevelopedbuildsuponusedtwo-samplet-statisticapproachusesHilbertspacedecomposenonidentifiedgenevectorcomponentscorrelateduncorrelatedidentifiedsetapplicationderivedstudiedprostatecancerdatabaseproposedoutperformshighlyregardedapproachespublisheddateAlgorithmsMATLABRavailablepublicdownloadExploitingidentifiabilitydetection

Similar Articles

Cited By

No available data.