Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition.

Michal Marczyk, Roman Jaksik, Andrzej Polanski, Joanna Polanska
Author Information
  1. Michal Marczyk: Institute of Automatic Control, Silesian University of Technology, Gliwice 44-100, Poland. Michal.Marczyk@polsl.pl

Abstract

BACKGROUND: DNA microarrays are used for discovery of genes expressed differentially between various biological conditions. In microarray experiments the number of analyzed samples is often much lower than the number of genes (probe sets) which leads to many false discoveries. Multiple testing correction methods control the number of false discoveries but decrease the sensitivity of discovering differentially expressed genes. Concerning this problem, filtering methods for improving the power of detection of differentially expressed genes were proposed in earlier papers. These techniques are two-step procedures, where in the first step some pool of non-informative genes is removed and in the second step only the pool of the retained genes is used for searching for differentially expressed genes.
RESULTS: A very important parameter to choose is the proportion between the sizes of the pools of removed and retained genes. A new method, which we propose, allow to determine close to optimal threshold values for sample means and sample variances for gene filtering. The method is adaptive and based on the decomposition of the histogram of gene expression means or variances into mixture of Gaussian components.
CONCLUSIONS: By performing analyses of several publicly available datasets and simulated datasets we demonstrate that our adaptive method increases sensitivity of finding differentially expressed genes compared to previous methods of filtering microarray data based on using fixed threshold values.

References

  1. BMC Bioinformatics. 2005 Aug 08;6:199 [PMID: 16086831]
  2. Bioinformatics. 2004 Apr 12;20(6):917-23 [PMID: 14751970]
  3. Physiol Genomics. 2007 Feb 12;28(3):284-93 [PMID: 17062650]
  4. BMC Bioinformatics. 2009 Jan 08;10:11 [PMID: 19133141]
  5. Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5 [PMID: 12883005]
  6. Bioinformatics. 2003 Jan 22;19(2):185-93 [PMID: 12538238]
  7. Nucleic Acids Res. 2007;35(16):e102 [PMID: 17702762]
  8. Proc Natl Acad Sci U S A. 2010 May 25;107(21):9546-51 [PMID: 20460310]
  9. BMC Bioinformatics. 2010 Jul 27;11:400 [PMID: 20663218]
  10. Bioinformatics. 2004 May 22;20(8):1222-32 [PMID: 14871871]
  11. BMC Bioinformatics. 2010 May 27;11:285 [PMID: 20507584]
  12. BMC Bioinformatics. 2006 Jan 31;7:49 [PMID: 16448562]
  13. Bioinformatics. 2008 Jan 1;24(1):110-7 [PMID: 18048398]
  14. Nucleic Acids Res. 2011 Jul;39(13):e86 [PMID: 21525126]
  15. Clin Cancer Res. 2005 Oct 15;11(20):7209-19 [PMID: 16243790]
  16. J Comput Biol. 2002;9(4):671-83 [PMID: 12323100]
  17. Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):9834-9 [PMID: 10963655]
  18. Bioinformatics. 2002 Feb;18(2):275-86 [PMID: 11847075]
  19. Bioinformatics. 2007 Nov 1;23(21):2897-902 [PMID: 17921172]

MeSH Term

Algorithms
Animals
Gene Expression Profiling
Humans
Normal Distribution
Oligonucleotide Array Sequence Analysis

Word Cloud

Created with Highcharts 10.0.0genesexpresseddifferentiallyfilteringmicroarraynumbermethodsmethodgenebasedusedfalsediscoveriessensitivitysteppoolremovedretainedthresholdvaluessamplemeansvariancesadaptivedecompositionexpressionmixtureGaussiandatasetsdataBACKGROUND:DNAmicroarraysdiscoveryvariousbiologicalconditionsexperimentsanalyzedsamplesoftenmuchlowerprobesetsleadsmanyMultipletestingcorrectioncontroldecreasediscoveringConcerningproblemimprovingpowerdetectionproposedearlierpaperstechniquestwo-stepproceduresfirstnon-informativesecondsearchingRESULTS:importantparameterchooseproportionsizespoolsnewproposeallowdeterminecloseoptimalhistogramcomponentsCONCLUSIONS:performinganalysesseveralpubliclyavailablesimulateddemonstrateincreasesfindingcomparedprevioususingfixedAdaptive

Similar Articles

Cited By