Bayesian Generalized Linear Models for Analyzing Compositional and Sub-Compositional Microbiome Data via EM Algorithm.

Li Zhang, Zhenying Ding, Jinhong Cui, Xiaoxiao Zhou, Nengjun Yi
Author Information
  1. Li Zhang: Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, Pennsylvania, USA.
  2. Zhenying Ding: Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, USA.
  3. Jinhong Cui: Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, USA. ORCID
  4. Xiaoxiao Zhou: Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, USA.
  5. Nengjun Yi: Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, USA.

Abstract

The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum-to-zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high-dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub-compositional microbiome data. Our model employs a spike-and-slab double-exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high-dimensional microbiome data. The sum-to-zero constraint is handled through soft-centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation-maximization (EM) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM https://github.com/nyiuab/BhGLM.

Keywords

References

K. J. Pflughoeft and J. Versalovic, “Human Microbiome in Health and Disease,” Annual Review of Pathology: Mechanisms of Disease 7, no. 1 (2012): 99–122.
I. Cho and M. J. Blaser, “The Human Microbiome: At the Interface of Health and Disease,” Nature Reviews Genetics 13, no. 4 (2012): 260–270.
S. M. Huse, Y. Ye, Y. Zhou, and A. A. Fodor, “A Core Human Microbiome as Viewed Through 16S rRNA Sequence Clusters,” PLoS One 7, no. 6 (2012): e34242.
H. Li, “Microbiome, Metagenomics, and High‐Dimensional Compositional Data Analysis,” Annual Review of Statistics and Its Application 2, no. 1 (2015): 73–94.
L. Zhang, X. Zhang, and N. Yi, “Bayesian Compositional Generalized Linear Models for Analyzing Microbiome Data,” Statistics in Medicine 43, no. 1 (2024): 141–155.
J. Aitchison and J. Bacon‐Shone, “Log Contrast Models for Experiments With Mixtures,” Biometrika 71, no. 2 (1984): 323–330.
P. Shi, A. Zhang, and H. Li, “Regression Analysis for Microbiome Compositional Data,” Annals of Applied Statistics 10, no. 2 (2016): 1019–1040.
J. Aitchison, “The Statistical Analysis of Compositional Data,” Journal of the Royal Statistical Society: Series B: Methodological 44, no. 2 (1982): 139–160.
W. Lin, P. Shi, R. Feng, and H. Li, “Variable Selection in Regression With Compositional Covariates,” Biometrika 101, no. 4 (2014): 785–797.
L. Zhang, Y. Shi, R. R. Jenq, K. A. Do, and C. B. Peterson, “Bayesian Compositional Regression With Structured Priors for Microbiome Feature Selection,” Biometrics 77, no. 3 (2021): 824–838.
J. Lu, P. Shi, and H. Li, “Generalized Linear Models With Linear Constraints for Microbiome Compositional Data,” Biometrics 75, no. 1 (2019): 235–244.
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, vol. 2 (Academic Press, 2014).
N. Yi, Z. Tang, X. Zhang, and B. Guo, “BhGLM: Bayesian Hierarchical GLMs and Survival Models, With Applications to Genomics and Epidemiology,” Bioinformatics 35, no. 8 (2019): 1419–1421.
A. Gelman, A. Jakulin, M. G. Pittau, and Y. S. Su, “A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models,” Annals of Applied Statistics 2, no. 4 (2008): 1360–1383.
T. Park and G. Casella, “The Bayesian Lasso,” Journal of the American Statistical Association 103, no. 482 (2008): 681–686.
Z. Tang, Y. Shen, X. Zhang, and N. Yi, “The Spike‐and‐Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection,” Genetics 205, no. 1 (2017): 77–88.
Z. Tang, Y. Shen, Y. Li, et al., “Group Spike‐and‐Slab Lasso Generalized Linear Models for Disease Prediction and Associated Genes Detection by Incorporating Pathway Information,” Bioinformatics 34, no. 6 (2018): 901–910.
N. Yi and S. Ma, “Hierarchical Shrinkage Priors and Model Fitting for High‐Dimensional Generalized Linear Models,” Statistical Applications in Genetics and Molecular Biology 11, no. 6 (2012), https://doi.org/10.1515/1544‐6115.1803.
A. Gelman, Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge University Press, 2007).
E. Steyerberg, Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating (Springer, 2009).
J. D. Lewis, E. Z. Chen, R. N. Baldassano, et al., “Inflammation, Antibiotics, and Diet as Environmental Stressors of the Gut Microbiome in Pediatric Crohn's Disease,” Cell Host & Microbe 18, no. 4 (2015): 489–500.
W. Ma, L. H. Nguyen, M. Song, et al., “Dietary Fiber Intake, the Gut Microbiome, and Chronic Systemic Inflammation in a Cohort of Adult Men,” Genome Medicine 13, no. 1 (2021): 102.
V. Valentino, F. De Filippis, R. Marotta, E. Pasolli, and D. Ercolini, “Genomic Features and Prevalence of Ruminococcus Species in Humans are Associated With Age, Lifestyle, and Disease,” Cell Reports 43, no. 12 (2024): 115018.

MeSH Term

Bayes Theorem
Algorithms
Humans
Linear Models
Microbiota
Computer Simulation
Monte Carlo Method
Markov Chains

Word Cloud

Similar Articles

Cited By