Count data modeling and classification using finite mixtures of distributions.

Nizar Bouguila
Author Information
  1. Nizar Bouguila: Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC H3G 1T7, Canada. bouguila@ciise.concordia.ca

Abstract

In this paper, we consider the problem of constructing accurate and flexible statistical representations for count data, which we often confront in many areas such as data mining, computer vision, and information retrieval. In particular, we analyze and compare several generative approaches widely used for count data clustering, namely multinomial, multinomial Dirichlet, and multinomial generalized Dirichlet mixture models. Moreover, we propose a clustering approach via a mixture model based on a composition of the Liouville family of distributions, from which we select the Beta-Liouville distribution, and the multinomial. The novel proposed model, which we call multinomial Beta-Liouville mixture, is optimized by deterministic annealing expectation-maximization and minimum description length, and strives to achieve a high accuracy of count data clustering and model selection. An important feature of the multinomial Beta-Liouville mixture is that it has fewer parameters than the recently proposed multinomial generalized Dirichlet mixture. The performance evaluation is conducted through a set of extensive empirical experiments, which concern text and image texture modeling and classification and shape modeling, and highlights the merits of the proposed models and approaches.

MeSH Term

Algorithms
Artificial Intelligence
Computer Simulation
Data Mining
Electronic Data Processing
Humans
Mathematical Concepts
Models, Theoretical
Neural Networks, Computer
Pattern Recognition, Automated

Word Cloud

Created with Highcharts 10.0.0multinomialdatamixturecountclusteringDirichletmodelBeta-LiouvilleproposedmodelingapproachesgeneralizedmodelsdistributionsclassificationpaperconsiderproblemconstructingaccurateflexiblestatisticalrepresentationsoftenconfrontmanyareasminingcomputervisioninformationretrievalparticularanalyzecompareseveralgenerativewidelyusednamelyMoreoverproposeapproachviabasedcompositionLiouvillefamilyselectdistributionnovelcalloptimizeddeterministicannealingexpectation-maximizationminimumdescriptionlengthstrivesachievehighaccuracyselectionimportantfeaturefewerparametersrecentlyperformanceevaluationconductedsetextensiveempiricalexperimentsconcerntextimagetextureshapehighlightsmeritsCountusingfinitemixtures

Similar Articles

Cited By