- Sanjeena Subedi: School of Mathematics & Statistics, Carleton University, Ontario, Canada.
- Utkarsh J Dang: Department of Health Sciences, Carleton University, Ontario, Canada.
Modern biological data are often multivariate discrete counts, and there has been a dearth of statistical distributions to directly model such counts in an efficient manner. While mixed Poisson distributions, e.g., negative binomial distribution, are often the distribution of choice for univariate data, multivariate statistical distributions and their algorithmic implementations tend to have different drawbacks, e.g., non-tractable distributions, non-closed form solutions for parameter estimates, constrained correlation structures, and slow convergence during iterative parameter estimation. Herein, we provide an overview of the Poisson lognormal and multivariate Poisson lognormal distributions. These distributions can be written in an hierarchical fashion. An efficient variational approximation-based parameter estimation strategy as well as a hybrid approach for full Bayesian posterior estimation is available for such models, allowing for scaling up and modeling high-dimensional data. We provide comparisons of the univariate Poisson, the negative binomial, and the Poisson lognormal distributions in terms of the estimated mean-variance relationships using simulations and example real datasets. We also discuss the properties of the multivariate Poisson lognormal distribution, and ability to directly model count data including zero counts, over-dispersion, both positive and negative covariance elements, and the mapping from correlations in the latent space vs. the observed space. Finally, we illustrate their use through two model-based clustering examples using a mixtures of distributions approach in RNA-seq and microbiome data.