Outlier Identification in Model-Based Cluster Analysis.

Katie Evans, Tanzy Love, Sally W Thurston
Author Information
  1. Katie Evans: Dupont, DuET Applied Statistics, Delaware USA.
  2. Tanzy Love: University of Rochester, Rochester USA.
  3. Sally W Thurston: University of Rochester, Rochester USA.

Abstract

In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data.

Keywords

References

  1. J Am Stat Assoc. 1990 Sep;85(411):609-16 [PMID: 12155387]

Grants

  1. P30 ES001247/NIEHS NIH HHS
  2. T32 ES007271/NIEHS NIH HHS

Word Cloud

Created with Highcharts 10.0.0observationsmethodidentifymodel-basedclusteringmodelsclusteroutlierscluster-specificvariancewithoutMCLUSTresultsNationalHockeyLeaguebasednormal-mixtureoutlyingcaninfluencestructurenumberpaperdevelopshoweverattemptclustersamidstlargefieldnoisyminimalmembershipproportionobservationdifferentResultssimulationstudydemonstrateabilitydetecttruefalselyidentifyingmanynon-outliersimprovedperformanceapproachesscenariosusecontributedRpackageproposemodifiedprioravoidsdegeneraciesestimationproceduresalsocompareoutlierpublisheddataOutlierIdentificationModel-BasedClusterAnalysisInfluentialpointsNormal-mixturePrior

Similar Articles

Cited By