Bayesian consensus clustering.

Eric F Lock, David B Dunson
Author Information
  1. Eric F Lock: Department of Statistical Science, Duke University, Durham, NC 27708, USA and Center for Human Genetics, Duke University Medical Center, Durham, NC 27710, USA.

Abstract

MOTIVATION: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources.
RESULTS: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas.
AVAILABILITY: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/software.html.

References

  1. Bioinformatics. 2010 Jun 15;26(12):i158-67 [PMID: 20529901]
  2. PLoS Comput Biol. 2011 Oct;7(10):e1002227 [PMID: 22028636]
  3. Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4245-50 [PMID: 23431203]
  4. Breast Cancer Res. 2010;12(3):R42 [PMID: 20576095]
  5. Nature. 2012 Apr 18;486(7403):346-52 [PMID: 22522925]
  6. Nature. 2012 Oct 4;490(7418):61-70 [PMID: 23000897]
  7. Bioinformatics. 2008 Dec 15;24(24):2894-900 [PMID: 18974169]
  8. Bioinformatics. 2009 Nov 15;25(22):2906-12 [PMID: 19759197]
  9. Bioinformatics. 2012 Dec 15;28(24):3290-7 [PMID: 23047558]
  10. Bioinformatics. 2014 May 15;30(10):1370-6 [PMID: 24489367]
  11. Ann Appl Stat. 2013 Mar 1;7(1):523-542 [PMID: 23745156]
  12. CPT Pharmacometrics Syst Pharmacol. 2013 Mar 27;2:e35 [PMID: 23836026]

Grants

  1. R01-ES017436/NIEHS NIH HHS

MeSH Term

Algorithms
Bayes Theorem
Cluster Analysis
Gene Dosage
Genomics
Humans
Models, Statistical

Word Cloud

Created with Highcharts 10.0.0clusteringdatasourcesseparatesourceconsensusobjectsapproachesindependentlydetermineflexiblemodelclusteringsBayesianavailableMOTIVATION:biomedicalresearchgrowingnumberplatformstechnologiesusedmeasurediverserelatedinformationtasksetbasedmultiplearisesseveralapplicationscurrentmultisourceeithersingle'joint'needsimultaneouslydependenceheterogeneityRESULTS:proposeintegrativestatisticalpermitsadherelooselyoverallhenceindependentdescribecomputationallyscalableframeworksimultaneousestimationsource-specificdemonstrateapproachrobustjointpowerfulpresentapplicationsubtypeidentificationbreastcancertumorsamplesusingpubliclyCancerGenomeAtlasAVAILABILITY:Rcodeinstructionsexampleshttp://peopledukeedu/%7Eel113/softwarehtml

Similar Articles

Cited By