BZINB model-based pathway analysis and module identification facilitates integration of microbiome and metabolome data.

Bridget Lin, Hunyong Cho, Chuwen Liu, Jeff Roach, Apoena Aguiar Ribeiro, Kimon Divaris, Di Wu
Author Information
  1. Bridget Lin: Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  2. Hunyong Cho: Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  3. Chuwen Liu: Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  4. Jeff Roach: Research Computing, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  5. Apoena Aguiar Ribeiro: Division of Diagnostic Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  6. Kimon Divaris: Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  7. Di Wu: Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.

Abstract

Integration of multi-omics data is a challenging but necessary step to advance our understanding of the biology underlying human health and disease processes. To date, investigations seeking to integrate multi-omics (e.g., microbiome and metabolome) employ simple correlation-based network analyses; however, these methods are not always well-suited for microbiome analyses because they do not accommodate the excess zeros typically present in these data. In this paper, we introduce a bivariate zero-inflated negative binomial (BZINB) model-based network and module analysis method that addresses this limitation and improves microbiome-metabolome correlation-based model fitting by accommodating excess zeros. We use real and simulated data based on a multi-omics study of childhood oral health (ZOE 2.0; investigating early childhood dental disease, ECC) and find that the accuracy of the BZINB model-based correlation method is superior compared to Spearman��������s rank and Pearson correlations in terms of approximating the underlying relationships between microbial taxa and metabolites. The new method, BZINB-iMMPath facilitates the construction of metabolite-species and species-species correlation networks using BZINB and identifies modules of (i.e., correlated) species by combining BZINB and similarity-based clustering. Perturbations in correlation networks and modules can be efficiently tested between groups (i.e., healthy and diseased study participants). Upon application of the new method in the ZOE 2.0 study microbiome-metabolome data, we identify that several biologically-relevant correlations of ECC-associated microbial taxa with carbohydrate metabolites differ between healthy and dental caries-affected participants. In sum, we find that the BZINB model is a useful alternative to Spearman or Pearson correlations for estimating the underlying correlation of zero-inflated bivariate count data and thus is suitable for integrative analyses of multi-omics data such as those encountered in microbiome and metabolome studies.

References

  1. Bioinformatics. 2020 Feb 15;36(4):1159-1166 [PMID: 31501851]
  2. J Bacteriol. 2015 Apr 27;197(3):2104-2111 [PMID: 25917902]
  3. Caries Res. 2013;47(2):89-102 [PMID: 23207320]
  4. J Dent Res. 2015 Dec;94(12):1628-37 [PMID: 26377570]
  5. Stat Methods Med Res. 2023 Jul;32(7):1300-1317 [PMID: 37167422]
  6. Comput Struct Biotechnol J. 2020 Sep 10;18:2583-2595 [PMID: 33033579]
  7. J Dent Res. 2021 Jun;100(6):615-622 [PMID: 33423574]
  8. Brief Bioinform. 2023 Sep 20;24(5): [PMID: 37738402]
  9. NPJ Syst Biol Appl. 2020 Jun 19;6(1):20 [PMID: 32561750]
  10. J Neurochem. 1995 Apr;64(4):1734-41 [PMID: 7891102]
  11. Genome Biol. 2019 Nov 28;20(1):257 [PMID: 31779668]
  12. Nucleic Acids Res. 2012 Sep 1;40(17):e133 [PMID: 22638577]
  13. J Clin Periodontol. 2017 Mar;44 Suppl 18:S23-S38 [PMID: 28266108]
  14. Int J Environ Res Public Health. 2020 Nov 01;17(21): [PMID: 33139633]
  15. Front Cell Dev Biol. 2020 Oct 22;8:588041 [PMID: 33195248]
  16. Cell Rep Methods. 2021 Oct 25;1(6):100095 [PMID: 35474895]
  17. IUBMB Life. 2008 Sep;60(9):605-8 [PMID: 18506840]
  18. Anal Chem. 2009 Aug 15;81(16):6656-67 [PMID: 19624122]
  19. Nat Microbiol. 2019 Feb;4(2):293-305 [PMID: 30531976]
  20. Genet Epidemiol. 2021 Mar;45(2):142-153 [PMID: 32989764]
  21. Comput Biol Med. 2021 Nov;138:104933 [PMID: 34655897]
  22. Genome Res. 2003 Nov;13(11):2498-504 [PMID: 14597658]
  23. Mol Microbiol. 2007 Feb;63(3):872-80 [PMID: 17302806]
  24. Nat Commun. 2020 Mar 3;11(1):1169 [PMID: 32127540]
  25. Microb Cell. 2018 May 07;5(5):215-219 [PMID: 29796386]
  26. Brief Bioinform. 2022 May 13;23(3): [PMID: 35325048]
  27. Nat Methods. 2018 Nov;15(11):962-968 [PMID: 30377376]
  28. J Bacteriol. 2010 Oct;192(19):5002-17 [PMID: 20656903]
  29. BMC Bioinformatics. 2008 Dec 29;9:559 [PMID: 19114008]

Grants

  1. R03 DE028983/NIDCR NIH HHS
  2. U01 DE025046/NIDCR NIH HHS

Word Cloud

Created with Highcharts 10.0.0dataBZINBmulti-omicsmicrobiomemethodcorrelationunderlyingemetabolomeanalysesmodel-basedstudycorrelationshealthdiseasecorrelation-basednetworkexcesszerosbivariatezero-inflatedmoduleanalysismicrobiome-metabolomemodelchildhoodZOE20dentalfindPearsonmicrobialtaxametabolitesnewfacilitatesnetworksmodulesihealthyparticipantsIntegrationchallengingnecessarystepadvanceunderstandingbiologyhumanprocessesdateinvestigationsseekingintegrategemploysimplehowevermethodsalwayswell-suitedaccommodatetypicallypresentpaperintroducenegativebinomialaddresseslimitationimprovesfittingaccommodatinguserealsimulatedbasedoralinvestigatingearlyECCaccuracysuperiorcomparedSpearman��������sranktermsapproximatingrelationshipsBZINB-iMMPathconstructionmetabolite-speciesspecies-speciesusingidentifiescorrelatedspeciescombiningsimilarity-basedclusteringPerturbationscanefficientlytestedgroupsdiseasedUponapplicationidentifyseveralbiologically-relevantECC-associatedcarbohydratediffercaries-affectedsumusefulalternativeSpearmanestimatingcountthussuitableintegrativeencounteredstudiespathwayidentificationintegration

Similar Articles

Cited By

No available data.