Bregmannian consensus clustering for cancer subtypes analysis.

Jianqiang Li, Liyang Xie, Yunshen Xie, Fei Wang
Author Information
  1. Jianqiang Li: School of Software, Beijing University of Technology, China. Electronic address: lijianqiang@bjut.edu.cn.
  2. Liyang Xie: School of Software, Beijing University of Technology, China.
  3. Yunshen Xie: School of Software, Beijing University of Technology, China.
  4. Fei Wang: Weill Cornell Medical College, Cornell University, USA.

Abstract

Cancer subtype analysis, as an extension of cancer diagnosis, can be regarded as a consensus clustering problem. This analysis is beneficial for providing patients with more accurate treatment. Consensus clustering refers to a situation in which several different clusters have been obtained for a particular data set, and it is desired to aggregate those clustering results to get a better clustering solution. In this paper, we propose to generalize the traditional consensus clustering methods in three manners: (1) We provide Bregmannian consensus clustering (BCC), where the loss between the consensus clustering result and all the input clusterings are generalized from a traditional Euclidean distance to a general Bregman loss; (2) we generalize the BCC to a weighted case, where each input clustering has different weights, providing a better solution for the final clustering result; and (3) we propose a novel semi-supervised consensus clustering, which adds some must-link and cannot-link constraints compared with the first two methods. Then, we obtain three cancer (breast, lung, colorectal cancer) data sets from The Cancer Genome Atlas (TCGA). Each data set has three data types (mRNA, mircoRNA, methylation), and each is respectively used to test the accuracy of the proposed algorithms for clusterings. The experimental results demonstrate that the highest aggregation accuracy of the weighted BCC (WBCC) on cancer data sets is 90.2%. Moreover, although the lowest accuracy is 62.3%, it is higher than other methods on the same data set. Therefore, we conclude that as compared with the competition, our method is more effective.

Keywords

MeSH Term

Algorithms
Cluster Analysis
Gene Expression Profiling
Methylation
MicroRNAs
Neoplasms
RNA, Messenger

Chemicals

MicroRNAs
RNA, Messenger

Word Cloud

Created with Highcharts 10.0.0clusteringconsensusdatacanceranalysisCancersetmethodsthreeBCCaccuracyprovidingConsensusdifferentresultsbettersolutionproposegeneralizetraditionalBregmannianlossresultinputclusteringsBregmanweightedcomparedsetssubtypessubtypeextensiondiagnosiscanregardedproblembeneficialpatientsaccuratetreatmentreferssituationseveralclustersobtainedparticulardesiredaggregategetpapermanners:1providegeneralizedEuclideandistancegeneral2caseweightsfinal3novelsemi-supervisedaddsmust-linkcannot-linkconstraintsfirsttwoobtainbreastlungcolorectalGenomeAtlasTCGAtypesmRNAmircoRNAmethylationrespectivelyusedtestproposedalgorithmsexperimentaldemonstratehighestaggregationWBCC902%Moreoveralthoughlowest623%higherThereforeconcludecompetitionmethodeffectivedivergenceClustering

Similar Articles

Cited By