Multivariate Poisson lognormal distribution for modeling counts from modern biological data: An overview.

Sanjeena Subedi, Utkarsh J Dang
Author Information
  1. Sanjeena Subedi: School of Mathematics & Statistics, Carleton University, Ontario, Canada.
  2. Utkarsh J Dang: Department of Health Sciences, Carleton University, Ontario, Canada.

Abstract

Modern biological data are often multivariate discrete counts, and there has been a dearth of statistical distributions to directly model such counts in an efficient manner. While mixed Poisson distributions, e.g., negative binomial distribution, are often the distribution of choice for univariate data, multivariate statistical distributions and their algorithmic implementations tend to have different drawbacks, e.g., non-tractable distributions, non-closed form solutions for parameter estimates, constrained correlation structures, and slow convergence during iterative parameter estimation. Herein, we provide an overview of the Poisson lognormal and multivariate Poisson lognormal distributions. These distributions can be written in an hierarchical fashion. An efficient variational approximation-based parameter estimation strategy as well as a hybrid approach for full Bayesian posterior estimation is available for such models, allowing for scaling up and modeling high-dimensional data. We provide comparisons of the univariate Poisson, the negative binomial, and the Poisson lognormal distributions in terms of the estimated mean-variance relationships using simulations and example real datasets. We also discuss the properties of the multivariate Poisson lognormal distribution, and ability to directly model count data including zero counts, over-dispersion, both positive and negative covariance elements, and the mapping from correlations in the latent space vs. the observed space. Finally, we illustrate their use through two model-based clustering examples using a mixtures of distributions approach in RNA-seq and microbiome data.

Keywords

References

  1. Sci Rep. 2023 Sep 7;13(1):14758 [PMID: 37679485]
  2. Ann Appl Stat. 2022 Sep;16(3):1476-1499 [PMID: 36127929]
  3. Genome Res. 2011 Feb;21(2):193-202 [PMID: 20921232]
  4. Bioinformatics. 2015 May 1;31(9):1420-7 [PMID: 25563332]
  5. PLoS One. 2013 Apr 22;8(4):e61217 [PMID: 23630581]
  6. Biometrics. 2018 Mar;74(1):362-368 [PMID: 28504821]
  7. Genome Biol. 2014;15(12):550 [PMID: 25516281]
  8. Bioinformatics. 2019 Mar 1;35(5):778-786 [PMID: 30101356]
  9. Front Microbiol. 2017 Nov 15;8:2224 [PMID: 29187837]
  10. BMC Bioinformatics. 2018 Dec 12;19(1):474 [PMID: 30541426]
  11. Bioinformatics. 2014 Jan 15;30(2):197-205 [PMID: 24191069]
  12. Neuron. 2014 Feb 19;81(4):847-59 [PMID: 24559675]
  13. J Clin Endocrinol Metab. 2012 Dec;97(12):4631-9 [PMID: 23024189]
  14. Biostatistics. 2012 Jul;13(3):523-38 [PMID: 22003245]
  15. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  16. Biometrics. 2017 Sep;73(3):792-801 [PMID: 28112797]
  17. Multivariate Behav Res. 2022 Mar-May;57(2-3):422-440 [PMID: 33476178]
  18. Ann Appl Stat. 2013 Mar 1;7(1): [PMID: 24312162]
  19. Nat Commun. 2014 Jul 08;5:4344 [PMID: 25003530]
  20. Mol Ecol Resour. 2020 Mar;20(2):481-497 [PMID: 31872949]
  21. BMC Bioinformatics. 2019 Jul 16;20(1):394 [PMID: 31311497]
  22. Bioinformatics. 2023 May 4;39(5): [PMID: 37018147]

Word Cloud

Created with Highcharts 10.0.0distributionsPoissondatalognormaldistributionmultivariatecountsnegativeparameterestimationbiologicaloftenstatisticaldirectlymodelefficientegbinomialunivariateprovideoverviewapproachmodelsmodelingusingspaceclusteringModerndiscretedearthmannermixedchoicealgorithmicimplementationstenddifferentdrawbacksnon-tractablenon-closedformsolutionsestimatesconstrainedcorrelationstructuresslowconvergenceiterativeHereincanwrittenhierarchicalfashionvariationalapproximation-basedstrategywellhybridfullBayesianposterioravailableallowingscalinghigh-dimensionalcomparisonstermsestimatedmean-variancerelationshipssimulationsexamplerealdatasetsalsodiscusspropertiesabilitycountincludingzeroover-dispersionpositivecovarianceelementsmappingcorrelationslatentvsobservedFinallyillustrateusetwomodel-basedexamplesmixturesRNA-seqmicrobiomeMultivariatemoderndata:CountMixtureModel-basedclassificationVariationalinference

Similar Articles

Cited By