Selecting differentially expressed genes from microarray experiments.

Margaret Sullivan Pepe, Gary Longton, Garnet L Anderson, Michel Schummer
Author Information
  1. Margaret Sullivan Pepe: Department of Biostatistics, University of Washington, Seattle, Washington 98195-7232, USA. mspepe@u.washington.edu

Abstract

High throughput technologies, such as gene expression arrays and protein mass spectrometry, allow one to simultaneously evaluate thousands of potential biomarkers that could distinguish different tissue types. Of particular interest here is distinguishing between cancerous and normal organ tissues. We consider statistical methods to rank genes (or proteins) in regards to differential expression between tissues. Various statistical measures are considered, and we argue that two measures related to the Receiver Operating Characteristic Curve are particularly suitable for this purpose. We also propose that sampling variability in the gene rankings be quantified, and suggest using the "selection probability function," the probability distribution of rankings for each gene. This is estimated via the bootstrap. A real dataset, derived from gene expression arrays of 23 normal and 30 ovarian cancer tissues, is analyzed. Simulation studies are also used to assess the relative performance of different statistical gene ranking measures and our quantification of sampling variability. Our approach leads naturally to a procedure for sample-size calculations, appropriate for exploratory studies that seek to identify differentially expressed genes.

Grants

  1. P50 CA083636/NCI NIH HHS
  2. CA 86368/NCI NIH HHS
  3. GM 54438/NIGMS NIH HHS

MeSH Term

Biomarkers, Tumor
Computer Simulation
Female
Gene Expression Profiling
Gene Expression Regulation, Neoplastic
Humans
Neoplasms
Oligonucleotide Array Sequence Analysis
Ovarian Neoplasms
Proteomics
ROC Curve
Sample Size
Statistics, Nonparametric

Chemicals

Biomarkers, Tumor

Word Cloud

Created with Highcharts 10.0.0geneexpressiontissuesstatisticalgenesmeasuresarraysdifferentnormalalsosamplingvariabilityrankingsprobabilitystudiesdifferentiallyexpressedHighthroughputtechnologiesproteinmassspectrometryallowonesimultaneouslyevaluatethousandspotentialbiomarkersdistinguishtissuetypesparticularinterestdistinguishingcancerousorganconsidermethodsrankproteinsregardsdifferentialVariousconsideredarguetworelatedReceiverOperatingCharacteristicCurveparticularlysuitablepurposeproposequantifiedsuggestusing"selectionfunction"distributionestimatedviabootstraprealdatasetderived2330ovariancanceranalyzedSimulationusedassessrelativeperformancerankingquantificationapproachleadsnaturallyproceduresample-sizecalculationsappropriateexploratoryseekidentifySelectingmicroarrayexperiments

Similar Articles

Cited By