Nonparametric methods for identifying differentially expressed genes in microarray data.

Olga G Troyanskaya, Mitchell E Garber, Patrick O Brown, David Botstein, Russ B Altman
Author Information
  1. Olga G Troyanskaya: Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.

Abstract

MOTIVATION: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs.
RESULTS: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.

Grants

  1. CA77097/NCI NIH HHS
  2. CA85129/NCI NIH HHS
  3. GM61374/NIGMS NIH HHS
  4. LM06244/NLM NIH HHS

MeSH Term

Carcinoma, Squamous Cell
Computer Simulation
False Positive Reactions
Gene Expression
Gene Expression Profiling
Humans
Lung Neoplasms
Lymphoma, B-Cell
Models, Genetic
Models, Statistical
Oligonucleotide Array Sequence Analysis
Reference Values
Reproducibility of Results
Sensitivity and Specificity
Sequence Alignment
Sequence Analysis, DNA
Statistics, Nonparametric

Word Cloud

Created with Highcharts 10.0.0genesdataexpresseddifferentiallyidentifymethodsmicroarrayexpressionexperimentsprovidewaymarkersrelevantclinicalrobustdiscriminatorgroupsnonparametrict-testranksumtestmethodbasedsimulatedbiologicalnoisep-valuesetsmaybiologicallyMOTIVATION:GenefastsystematicdiseasecarestudyaddressproblemidentificationDifferentiallysignificantlydifferenttwouser-definedcomparethreemodel-freeapproaches:12WilcoxonMann-Whitney3heuristichighPearsoncorrelationperfectlydifferentiatinggene'idealmethod'systematicallyassessperformancevaryinglevelscutoffsRESULTS:exhibitlowfalsepositiverateslargefractionlevelsimilaractualOverallappearsconservativeadvantageouscomputationallyidentifiedneedtestedHoweverinclusivelistdesiredhighercutoffappropriateappliedlungtumorlymphomaallowclearseparationquestionThusdescribedevaluatedconvenientanalysisNonparametricidentifying

Similar Articles

Cited By