FctClus: A Fast Clustering Algorithm for Heterogeneous Information Networks.

Jing Yang, Limin Chen, Jianpei Zhang
Author Information
  1. Jing Yang: Institute of Computer Science and Technology, Harbin Engineering University, Harbin, China.
  2. Limin Chen: Institute of Computer Science and Technology, Harbin Engineering University, Harbin, China; Institute of Computer Science and Technology, Mudanjiang Teachers College, Mudanjiang, China.
  3. Jianpei Zhang: Institute of Computer Science and Technology, Harbin Engineering University, Harbin, China.

Abstract

It is important to cluster heterogeneous information networks. A fast clustering algorithm based on an approximate commute time embedding for heterogeneous information networks with a star network schema is proposed in this paper by utilizing the sparsity of heterogeneous information networks. First, a heterogeneous information network is transformed into multiple compatible bipartite graphs from the compatible point of view. Second, the approximate commute time embedding of each bipartite graph is computed using random mapping and a linear time solver. All of the indicator subsets in each embedding simultaneously determine the target dataset. Finally, a general model is formulated by these indicator subsets, and a fast algorithm is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. The proposed fast algorithm, FctClus, is shown to be efficient and generalizable and exhibits high clustering accuracy and fast computation speed based on a theoretic analysis and experimental verification.

References

  1. IEEE Trans Pattern Anal Mach Intell. 2007 Nov;29(11):1873-90 [PMID: 17848771]

MeSH Term

Algorithms
Cluster Analysis
Information Services

Word Cloud

Created with Highcharts 10.0.0heterogeneousinformationfastnetworksclusteringalgorithmtimeembeddingindicatorsubsetsbasedapproximatecommutenetworkproposedcompatiblebipartiteusingsimultaneouslytargetimportantclusterstarschemapaperutilizingsparsityFirsttransformedmultiplegraphspointviewSecondgraphcomputedrandommappinglinearsolverdeterminedatasetFinallygeneralmodelformulatedderivedsumweighteddistancesindicatorsidenticalobjectFctClusshownefficientgeneralizableexhibitshighaccuracycomputationspeedtheoreticanalysisexperimentalverificationFctClus:FastClusteringAlgorithmHeterogeneousInformationNetworks

Similar Articles

Cited By (1)