Trajectory clustering using mixed classification models.

Amna Klich, René Ecochard, Fabien Subtil
Author Information
  1. Amna Klich: Université de Lyon, Lyon, France. ORCID
  2. René Ecochard: Université de Lyon, Lyon, France. ORCID
  3. Fabien Subtil: Université de Lyon, Lyon, France. ORCID

Abstract

Trajectory classification has become frequent in clinical research to understand the heterogeneity of individual trajectories. The standard classification model for trajectories assumes no between-individual variance within groups. However, this assumption is often not appropriate, which may overestimate the error variance of the model, leading to a biased classification. Hence, two extensions of the standard classification model were developed through a mixed model. The first one considers an equal between-individual variance across groups, and the second one considers unequal between-individual variance. Simulations were performed to evaluate the impact of these considerations on the classification. The simulation results showed that the first extended model gives a lower misclassification percentage (with differences up to 50%) than the standard one in case of presence of a true variance between individuals inside groups. The second model decreases the misclassification percentage compared with the first one (up to 11%) when the between-individual variance is unequal between groups. However, these two extensions require high number of repeated measurements to be adjusted correctly. Using human chorionic gonadotropin trajectories after curettage for hydatidiform mole, the standard classification model classified trajectories mainly according to their levels whereas the two extended models classified them according to their patterns, which provided more clinically relevant groups. In conclusion, for studies with a nonnegligible number of repeated measurements, the use, in first instance, of a classification model that considers equal between-individual variance across groups rather than a standard classification model, appears more appropriate. A model that considers unequal between-individual variance may find its place thereafter.

Keywords

References

  1. Nagin DS, Odgers CL. Group-based trajectory modeling in clinical research. Annu Rev Clin Psychol. 2010;6:109-138.
  2. Pickles A, Croudace T. Latent mixture models for multivariate and longitudinal outcomes. Stat Methods Med Res. 2010;19:271-289.
  3. Einbeck J, Darnell R, Hinde J. npmlreg: nonparametric maximum likelihood estimation for random effect models; 2014.
  4. Formann AK. Mixture analysis of longitudinal binary data. Stat Med. 2006;25:1457-1469.
  5. Legler JM, Davis WW, Potosky AL, Hoffman RM. Latent variable modelling of recovery trajectories: sexual function following radical prostatectomy. Stat Med. 2004;23:2875-2893.
  6. Klich A, Ecochard R, Subtil F. Unequal intra-group variance in trajectory classification. Stat Med. 2018;37(28):4155-4166.
  7. Symons M, Clustering J. Criteria and multivariate normal mixtures. Biometrics. 1981;37:35-43.
  8. Subtil F, Boussari O, Bastard M, Etard JF, Ecochard R, Génolini C. An alternative classification to mixture modeling for longitudinal counts or binary measures. Stat Methods Med Res. 2017;26:453-470.
  9. James GM, Sugar CA. Clustering for sparsely sampled functional data. J Am Stat Assoc. 2003;98:397-408.
  10. Celeux G, Govaert G. Comparison of the mixture and the classification maximum likelihood in cluster analysis. J Stat Comput Simul. 1993;47:127-146.
  11. Govaert G, Nadif M. Comparison of the mixture and the classification maximum likelihood in cluster analysis with binary data. Comput Stat Data Anal. 1996;23:65-81.
  12. Proust C, Jacqmin-Gadda H. Estimation of linear mixed models with a mixture of distribution for the random effects. Comput Methods Programs Biomed. 2005;78:165-173.
  13. Gaffney S, Smyth P. Trajectory clustering with mixtures of regression models. Paper presented at: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 1999:63-72; ACM, New York, NY. doi:https://doi.org/10.1145/312129.312198.
  14. Muthén B, Shedden K. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics. 1999;55:463-469.
  15. DeSarbo WS, Cron WL. A maximum likelihood methodology for clusterwise linear regression. J Classif. 1988;5:249-282.
  16. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963-974.
  17. McLachlan G, Peel D. Finite Mixture Models. Hoboken, NJ: John Wiley & Sons; 2004.
  18. Biernacki C, Celeux G., Govaert G. Assessing a mixture model for clustering with the integrated classification likelihood; 1998. https://hal.inria.fr/inria-00073163/document.
  19. Celeux G, Govaert G. A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal. 1992;14:315-332.
  20. Gałecki, A. & Burzykowski, T. Linear Mixed-Effects Models Using R-A Step-by-Step Approach. New York, NY: Springer; 2013.
  21. Schmitt C, Doret M, Massardier J, et al. Risk of gestational trophoblastic neoplasia after hCG normalisation according to hydatidiform mole type. Gynecol Oncol. 2013;130:86-89.
  22. Hartigan JA, Wong MA. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. 1979;28:100-108.
  23. Banfield JD, Raftery AE. Model-based Gaussian and non-Gaussian clustering. Biometrics. 1993;49:803.
  24. Celeux G, Govaert G. Gaussian parsimonious clustering models. Pattern Recognit. 1995;28:781-793.

MeSH Term

Cluster Analysis
Computer Simulation
Female
Humans
Pregnancy

Word Cloud

Created with Highcharts 10.0.0modelclassificationvariancebetween-individualgroupstrajectoriesstandardfirstoneconsiderstwomixedunequalTrajectoryHoweverappropriatemayextensionsequalacrosssecondextendedmisclassificationpercentagenumberrepeatedmeasurementsclassifiedaccordingmodelsbecomefrequentclinicalresearchunderstandheterogeneityindividualassumeswithinassumptionoftenoverestimateerrorleadingbiasedHencedevelopedSimulationsperformedevaluateimpactconsiderationssimulationresultsshowedgiveslowerdifferences50%casepresencetrueindividualsinsidedecreasescompared11%requirehighadjustedcorrectlyUsinghumanchorionicgonadotropincurettagehydatidiformmolemainlylevelswhereaspatternsprovidedclinicallyrelevantconclusionstudiesnonnegligibleuseinstanceratherappearsfindplacethereafterclusteringusingECMalgorithmlongitudinaldata

Similar Articles

Cited By