Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset.

Sung E Choe, Michael Boutros, Alan M Michelson, George M Church, Marc S Halfon
Author Information
  1. Sung E Choe: Department of Genetics, Harvard Medical School, New Research Building, 77 Avenue Louis Pasteur, Boston, MA 02115, USA. sung_choe@post.harvard.edu

Abstract

BACKGROUND: As more methods are developed to analyze RNA-profiling data, assessing their performance using control datasets becomes increasingly important.
RESULTS: We present a 'spike-in' experiment for Affymetrix GeneChips that provides a defined dataset of 3,860 RNA species, which we use to evaluate analysis options for identifying differentially expressed genes. The experimental design incorporates two novel features. First, to obtain accurate estimates of false-positive and false-negative rates, 100-200 RNAs are spiked in at each fold-change level of interest, ranging from 1.2 to 4-fold. Second, instead of using an uncharacterized background RNA sample, a set of 2,551 RNA species is used as the constant (1x) set, allowing us to know whether any given probe set is truly present or absent. Application of a large number of analysis methods to this dataset reveals clear variation in their ability to identify differentially expressed genes. False-negative and false-positive rates are minimized when the following options are chosen: subtracting nonspecific signal from the PM probe intensities; performing an intensity-dependent normalization at the probe set level; and incorporating a signal intensity-dependent standard deviation in the test statistic.
CONCLUSIONS: A best-route combination of analysis methods is presented that allows detection of approximately 70% of true positives before reaching a 10% false-discovery rate. We highlight areas in need of improvement, including better estimate of false-discovery rates and decreased false-negative rates.

References

  1. Bioinformatics. 2003 Aug 12;19(12):1469-76 [PMID: 12912826]
  2. Nucleic Acids Res. 2003 Feb 15;31(4):e15 [PMID: 12582260]
  3. Bioinformatics. 2004 Apr 12;20(6):839-46 [PMID: 14751998]
  4. Nucleic Acids Res. 2002 Feb 15;30(4):e15 [PMID: 11842121]
  5. Bioinformatics. 2003 May 22;19(8):956-65 [PMID: 12761058]
  6. Science. 1995 Oct 20;270(5235):467-70 [PMID: 7569999]
  7. J Cell Biochem Suppl. 2001;Suppl 37:120-5 [PMID: 11842437]
  8. Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21 [PMID: 11309499]
  9. Proc Natl Acad Sci U S A. 2001 Jan 2;98(1):31-6 [PMID: 11134512]
  10. Genome Biol. 2003;4(6):R41 [PMID: 12801415]
  11. Nat Biotechnol. 2003 Jul;21(7):818-21 [PMID: 12794640]
  12. Biostatistics. 2003 Apr;4(2):249-64 [PMID: 12925520]
  13. Bioinformatics. 2001 Jun;17(6):509-19 [PMID: 11395427]
  14. Genome Biol. 2003;4(10):R67 [PMID: 14519202]
  15. Nat Biotechnol. 1996 Dec;14(13):1675-80 [PMID: 9634850]
  16. J Comput Biol. 2005 Jul-Aug;12(6):882-93 [PMID: 16108723]
  17. Genome Biol. 2002;3(1):RESEARCH0005 [PMID: 11806828]
  18. Bioinformatics. 2003 Jan 22;19(2):185-93 [PMID: 12538238]

Grants

  1. F32 GM067483/NIGMS NIH HHS
  2. K22 HG002489/NHGRI NIH HHS
  3. F32 GM67483-01A1/NIGMS NIH HHS
  4. K22-HG002489/NHGRI NIH HHS

MeSH Term

Algorithms
Animals
Drosophila
Gene Expression Profiling
Oligonucleotide Array Sequence Analysis
Oligonucleotide Probes
RNA, Messenger

Chemicals

Oligonucleotide Probes
RNA, Messenger

Word Cloud

Created with Highcharts 10.0.0methodsanalysisratessetdatasetRNAprobeusingcontrolpresentAffymetrixGeneChipsdefinedspeciesoptionsdifferentiallyexpressedgenesfalse-positivefalse-negativelevel2signalintensity-dependentfalse-discoveryBACKGROUND:developedanalyzeRNA-profilingdataassessingperformancedatasetsbecomesincreasinglyimportantRESULTS:'spike-in'experimentprovides3860useevaluateidentifyingexperimentaldesignincorporatestwonovelfeaturesFirstobtainaccurateestimates100-200RNAsspikedfold-changeinterestranging14-foldSecondinsteaduncharacterizedbackgroundsample551usedconstant1xallowingusknowwhethergiventrulyabsentApplicationlargenumberrevealsclearvariationabilityidentifyFalse-negativeminimizedfollowingchosen:subtractingnonspecificPMintensitiesperformingnormalizationincorporatingstandarddeviationteststatisticCONCLUSIONS:best-routecombinationpresentedallowsdetectionapproximately70%truepositivesreaching10%ratehighlightareasneedimprovementincludingbetterestimatedecreasedPreferredrevealedwholly

Similar Articles

Cited By