β-empirical Bayes inference and model diagnosis of microarray data.

Mohammad Manir Hossain Mollah, M Nurul Haque Mollah, Hirohisa Kishino
Author Information
  1. Mohammad Manir Hossain Mollah: Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan. mollah@lbm.ab.a.u-tokyo.ac.jp

Abstract

BACKGROUND: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, he data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models.
RESULTS: As an extension of empirical Bayes (EB) procedures, we have developed the β-empirical Bayes (β-EB) approach based on a β-likelihood measure which can be regarded as an 'evidence-based' weighted (quasi-) likelihood inference. The weight of a transcript t is described as a power function of its likelihood, fβ(yt|θ). Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β₀-likelihood by cross-validation. The proposed β-EB approach identified six significant (p<10⁻⁵) contaminated transcripts as differentially expressed (DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach.
CONCLUSIONS: The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When β-weights outside the range of the predicted distribution were observed, a detailed inspection of the data was carried out. The β-weights described here can be applied to other likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome and proteome studies.

References

  1. Cancer Inform. 2011;10:205-15 [PMID: 21863128]
  2. Hum Mol Genet. 2005 Feb 15;14(4):475-82 [PMID: 15615770]
  3. Genetics. 2007 Mar;175(3):1441-50 [PMID: 17179097]
  4. J Leukoc Biol. 2009 Sep;86(3):557-66 [PMID: 19451397]
  5. Biometrics. 2006 Mar;62(1):19-27 [PMID: 16542225]
  6. BMC Bioinformatics. 2006 Jul 20;7:353 [PMID: 16857053]
  7. Biometrics. 2009 Sep;65(3):805-14 [PMID: 19173705]
  8. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50 [PMID: 16199517]
  9. BMC Bioinformatics. 2008 Mar 06;9:142 [PMID: 18325100]
  10. Oncol Rep. 1998 Nov-Dec;5(6):1349-61 [PMID: 9769367]
  11. Bioessays. 1995 Jun;17(6):471-80 [PMID: 7575488]
  12. Bioinformatics. 2011 Mar 15;27(6):807-14 [PMID: 21252077]
  13. Bioinformatics. 2011 Jul 1;27(13):i366-73 [PMID: 21685094]
  14. J Comput Biol. 2010 Mar;17(3):355-67 [PMID: 20377450]
  15. Cell Biol Int. 2007 Feb;31(2):191-5 [PMID: 17088078]
  16. Cancer Inform. 2008 Jan 22;3:140-8 [PMID: 19455258]
  17. Bioinformatics. 2007 Feb 1;23(3):328-35 [PMID: 17138586]
  18. Biostatistics. 2009 Jul;10(3):446-50 [PMID: 19276243]
  19. Neural Netw. 2010 Mar;23(2):226-38 [PMID: 19963342]
  20. BMC Bioinformatics. 2010 Apr 12;11:183 [PMID: 20380745]
  21. Mol Cancer Res. 2011 Feb;9(2):133-48 [PMID: 21228116]
  22. PLoS One. 2011;6(5):e20060 [PMID: 21655325]
  23. BMC Bioinformatics. 2011 Feb 08;12:49 [PMID: 21303507]
  24. Neural Comput. 2002 Aug;14(8):1859-86 [PMID: 12180405]
  25. Stat Med. 2003 Dec 30;22(24):3899-914 [PMID: 14673946]
  26. Cell Mol Life Sci. 2004 Jun;61(11):1372-83 [PMID: 15170515]
  27. BMC Bioinformatics. 2009 Feb 13;10:61 [PMID: 19216778]
  28. Annu Rev Biochem. 2011;80:273-99 [PMID: 21548781]
  29. Nucleic Acids Res. 2011 Jan;39(Database issue):D1118-22 [PMID: 21059685]
  30. BMC Bioinformatics. 2011 Jun 07;12:228 [PMID: 21649912]
  31. Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21 [PMID: 11309499]
  32. Lung Cancer. 2009 Jan;63(1):32-8 [PMID: 18486272]
  33. BMC Bioinformatics. 2005 Jul 12;6:173 [PMID: 16011807]
  34. Biometrics. 2011 Dec;67(4):1617-26 [PMID: 21517790]
  35. Mol Cell Biol. 2010 Jun;30(12):3004-15 [PMID: 20368352]
  36. Nature. 2003 Mar 20;422(6929):297-302 [PMID: 12646919]
  37. J Comput Biol. 2001;8(1):37-52 [PMID: 11339905]
  38. Stat Appl Genet Mol Biol. 2004;3:Article3 [PMID: 16646809]
  39. BMC Bioinformatics. 2011 Feb 01;12:42 [PMID: 21281522]
  40. Biometrics. 2006 Mar;62(1):10-8 [PMID: 16542223]

MeSH Term

Algorithms
Arabidopsis
Bayes Theorem
Computer Simulation
Gene Expression Profiling
Head and Neck Neoplasms
Humans
Likelihood Functions
Lung Neoplasms
Models, Statistical
Oligonucleotide Array Sequence Analysis

Word Cloud

Created with Highcharts 10.0.0dataexpressionapproachBayesβ-EBstatisticalmodelsmodelcanpatternsgeneEBinferencelowweightsproposedidentifiedtranscriptsempiricaldevelopedmicroarrayscrutinizeirregularβ-empiricallikelihooddescribedoutliersrobustsixDEcancergenesapplieddistributionβ-weightsdiagnosisBACKGROUND:Microarrayenableshigh-throughputsurveymRNAprofilesgenomiclevelhoweverpresentschallengingproblemlargenumbersmallsamplesizesobtainedreducedimensionalityvariousBayesianhierarchicalHowevercomplexityexplainfullygenerallydifficultexpectedusualRESULTS:extensionproceduresbasedβ-likelihoodmeasureregarded'evidence-based'weightedquasi-weighttranscripttpowerfunctionyt|θGeneslikelihoodsunexpectedassigningbecomesvalueβcontrolsbalancerobustnessefficiencyselectedmaximizingpredictiveβ₀-likelihoodcross-validationsignificantp<10⁻⁵contaminateddifferentiallyexpressednormal/tumortissuesheadneckpatientsconfirmedrelatedclassicaleQTLanalysisArabidopsisthalianapotentialmasterregulatorsmissedCONCLUSIONS:simulationrealshowedmethoduseddiagnosestatisticallyoutsiderangepredictedobserveddetailedinspectioncarriedlikelihood-basedmayserveusefultooltranscriptomeproteomestudies

Similar Articles

Cited By