A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments.

Fangxin Hong, Rainer Breitling
Author Information
  1. Fangxin Hong: Department of Biostatistics, Division of Information Sciences, City of Hope National Medical Center, Beckman Research Institute, 1500 Duarte Rd, Duarte, CA 91010, USA. fxhong@jimmy.harvard.edu

Abstract

MOTIVATION: The proliferation of public data repositories creates a need for meta-analysis methods to efficiently evaluate, integrate and validate related datasets produced by independent groups. A t-based approach has been proposed to integrate effect size from multiple studies by modeling both intra- and between-study variation. Recently, a non-parametric 'rank product' method, which is derived based on biological reasoning of fold-change criteria, has been applied to directly combine multiple datasets into one meta study. Fisher's Inverse chi(2) method, which only depends on P-values from individual analyses of each dataset, has been used in a couple of medical studies. While these methods address the question from different angles, it is not clear how they compare with each other.
RESULTS: We comparatively evaluate the three methods; t-based hierarchical modeling, rank products and Fisher's Inverse chi(2) test with P-values from either the t-based or the rank product method. A simulation study shows that the rank product method, in general, has higher sensitivity and selectivity than the t-based method in both individual and meta-analysis, especially in the setting of small sample size and/or large between-study variation. Not surprisingly, Fisher's chi(2) method highly depends on the method used in the individual analysis. Application to real datasets demonstrates that meta-analysis achieves more reliable identification than an individual analysis, and rank products are more robust in gene ranking, which leads to a much higher reproducibility among independent studies. Though t-based meta-analysis greatly improves over the individual analysis, it suffers from a potentially large amount of false positives when P-values serve as threshold. We conclude that careful meta-analysis is a powerful tool for integrating multiple array studies.

MeSH Term

Data Interpretation, Statistical
Databases, Protein
Gene Expression Profiling
Meta-Analysis as Topic
Oligonucleotide Array Sequence Analysis
Reproducibility of Results
Sample Size
Sensitivity and Specificity

Word Cloud

Created with Highcharts 10.0.0methodmeta-analysist-basedindividualmethodsstudiesrankdatasetsmultipleFisher'schi2P-valuesanalysisevaluateintegrateindependentsizemodelingbetween-studyvariationstudyInversedependsusedproductsproducthigherlargeMOTIVATION:proliferationpublicdatarepositoriescreatesneedefficientlyvalidaterelatedproducedgroupsapproachproposedeffectintra-Recentlynon-parametric'rankproduct'derivedbasedbiologicalreasoningfold-changecriteriaapplieddirectlycombineonemetaanalysesdatasetcouplemedicaladdressquestiondifferentanglesclearcompareotherRESULTS:comparativelythreehierarchicaltesteithersimulationshowsgeneralsensitivityselectivityespeciallysettingsmallsampleand/orsurprisinglyhighlyApplicationrealdemonstratesachievesreliableidentificationrobustgenerankingleadsmuchreproducibilityamongThoughgreatlyimprovessufferspotentiallyamountfalsepositivesservethresholdconcludecarefulpowerfultoolintegratingarraycomparisondetectingdifferentiallyexpressedgenesmicroarrayexperiments

Similar Articles

Cited By