Novel Application of Survival Models for Predicting Microbial Community Transitions with Variable Selection for Environmental DNA.

Paul Bjorndahl, Joseph P Bielawski, Lihui Liu, Wei Zhou, Hong Gu
Author Information
  1. Paul Bjorndahl: Department of Mathematics & Statistics, Dalhousie Universitygrid.55602.34, Halifax, Nova Scotia, Canada. ORCID
  2. Joseph P Bielawski: Department of Biology, Dalhousie Universitygrid.55602.34, Halifax, Nova Scotia, Canada.
  3. Lihui Liu: Department of Mathematics & Statistics, Dalhousie Universitygrid.55602.34, Halifax, Nova Scotia, Canada.
  4. Wei Zhou: Department of Mathematics & Statistics, Dalhousie Universitygrid.55602.34, Halifax, Nova Scotia, Canada.
  5. Hong Gu: Department of Mathematics & Statistics, Dalhousie Universitygrid.55602.34, Halifax, Nova Scotia, Canada.

Abstract

Survival analysis is a prolific statistical tool in medicine for inferring risk and time to disease-related events. However, it is underutilized in microbiome research to predict microbial community-mediated events, partly due to the sparsity and high-dimensional nature of the data. We advance the application of Cox proportional hazards (Cox PH) survival models to environmental DNA (eDNA) data with feature selection suitable for filtering irrelevant and redundant taxonomic variables. Selection methods are compared in terms of false positives, sensitivity, and survival estimation accuracy in simulation and in a real data setting to forecast harmful cyanobacterial blooms. A novel extension of a method for selecting microbial biomarkers with survival data (SuRFCox) reliably outperforms other methods. We determine that Cox PH models with SuRFCox-selected predictors are more robust to varied signal, noise, and data correlation structure. SuRFCox also yields the most accurate and consistent prediction of blooms according to cross-validated testing by year over eight different bloom seasons. Identification of common biomarkers among validated survival forecasts over changing conditions has clear biological significance. Survival models with such biomarkers inform risk assessment and provide insight into the causes of critical community transitions. In this paper, we report on a novel approach of selecting microorganisms for model-based prediction of the time to critical microbially modulated events (e.g., harmful algal blooms, clinical outcomes, community shifts, etc.). Our novel method for identifying biomarkers from large, dynamic communities of microbes has broad utility to environmental and ecological impact risk assessment and public health. Results will also promote theoretical and practical advancements relevant to the biology of specific organisms. To address the unique challenge posed by diverse environmental conditions and sparse microbes, we developed a novel method of selecting predictors for modeling time-to-event data. Competing methods for selecting predictors are rigorously compared to determine which is the most accurate and generalizable. Model forecasts are applied to show suitable predictors can precisely quantify the risk over time of biological events like harmful cyanobacterial blooms.

Keywords

References

  1. Toxicon X. 2018 Dec 10;1:100003 [PMID: 32831346]
  2. Perspect Clin Res. 2011 Oct;2(4):145-8 [PMID: 22145125]
  3. Harmful Algae. 2016 Apr;54:54-68 [PMID: 28073482]
  4. Nat Commun. 2018 Dec 21;9(1):5424 [PMID: 30575732]
  5. Sci Total Environ. 2019 Nov 1;689:789-796 [PMID: 31280161]
  6. Annu Rev Public Health. 1997;18:105-34 [PMID: 9143714]
  7. J R Stat Soc Series B Stat Methodol. 2008;70(5):903 [PMID: 19603084]
  8. Stat Med. 2021 Feb 20;40(4):897-919 [PMID: 33219557]
  9. Harmful Algae. 2016 Apr;54:174-193 [PMID: 28073475]
  10. Nat Commun. 2019 Jul 17;10(1):3136 [PMID: 31316056]
  11. Microbiome. 2015 Mar 10;3:8 [PMID: 25774293]
  12. Nat Commun. 2019 Jun 20;10(1):2719 [PMID: 31222023]
  13. BMC Public Health. 2020 Oct 27;20(1):1616 [PMID: 33109136]
  14. Hepatology. 2021 May;73(5):2063-2066 [PMID: 33283299]
  15. Genomics Inform. 2019 Mar;17(1):e6 [PMID: 30929407]
  16. ISME J. 2020 Mar;14(3):702-713 [PMID: 31796936]
  17. Water Res. 2015 Oct 15;83:171-83 [PMID: 26143274]
  18. ISME J. 2017 Aug;11(8):1746-1763 [PMID: 28524869]
  19. Cancer Epidemiol Biomarkers Prev. 2019 Apr;28(4):731-740 [PMID: 30733306]
  20. Front Genet. 2019 Nov 08;10:995 [PMID: 31781153]
  21. Genetics. 2017 Jan;205(1):89-100 [PMID: 28049703]
  22. Front Microbiol. 2021 Oct 11;12:727398 [PMID: 34737726]
  23. Clin Cancer Res. 2019 Oct 1;25(19):5972-5983 [PMID: 31296531]
  24. Adv Prev Med. 2019 Apr 9;2019:8392348 [PMID: 31093375]
  25. PLoS Comput Biol. 2020 Dec 14;16(12):e1008473 [PMID: 33315858]
  26. Mol Ecol Resour. 2021 Aug;21(6):1866-1874 [PMID: 33763959]
  27. Elife. 2019 Sep 10;8: [PMID: 31502536]
  28. Environ Microbiol. 2019 Apr 26;: [PMID: 31026366]
  29. Stat Med. 2005 Jun 15;24(11):1713-23 [PMID: 15724232]
  30. NAR Genom Bioinform. 2020 May 13;2(2):lqaa029 [PMID: 33575585]
  31. Crit Care. 2004 Oct;8(5):389-94 [PMID: 15469602]
  32. Water Res. 2019 Apr 1;152:96-105 [PMID: 30665164]
  33. mSystems. 2018 Jul 17;3(4): [PMID: 30035234]
  34. Biometrics. 2021 Dec;77(4):1369-1384 [PMID: 33006392]
  35. BMC Genomics. 2018 Mar 20;19(1):210 [PMID: 29558893]
  36. Genome Biol. 2011 Jun 24;12(6):R60 [PMID: 21702898]
  37. Genomics Inform. 2019 Dec;17(4):e41 [PMID: 31896241]
  38. Microbiome. 2017 Feb 8;5(1):17 [PMID: 28179014]
  39. Sci Total Environ. 2021 May 1;767:144984 [PMID: 33636761]
  40. Life (Basel). 2015 May 12;5(2):1346-80 [PMID: 25984732]
  41. Stat Med. 1997 Feb 28;16(4):385-95 [PMID: 9044528]
  42. Sci Transl Med. 2015 Sep 30;7(307):307ra152 [PMID: 26424567]
  43. Appl Microbiol. 1965 Nov;13(6):935-8 [PMID: 5866039]
  44. Inflamm Bowel Dis. 2016 Dec;22(12):2853-2862 [PMID: 27805918]

MeSH Term

Cyanobacteria
DNA, Environmental
Harmful Algal Bloom
Microbiota
Seasons

Chemicals

DNA, Environmental

Word Cloud

Created with Highcharts 10.0.0datasurvivalbiomarkersriskeventsenvironmentalbloomsnovelselectingpredictorsSurvivaltimemicrobialCoxmodelsmethodsharmfulmethodanalysismicrobiomePHDNAsuitableSelectioncomparedcyanobacterialSuRFCoxdeterminealsoaccuratepredictionforecastsconditionsbiologicalassessmentcriticalcommunitymicrobesprolificstatisticaltoolmedicineinferringdisease-relatedHoweverunderutilizedresearchpredictcommunity-mediatedpartlyduesparsityhigh-dimensionalnatureadvanceapplicationproportionalhazardseDNAfeatureselectionfilteringirrelevantredundanttaxonomicvariablestermsfalsepositivessensitivityestimationaccuracysimulationrealsettingforecastextensionreliablyoutperformsSuRFCox-selectedrobustvariedsignalnoisecorrelationstructureyieldsconsistentaccordingcross-validatedtestingyeareightdifferentbloomseasonsIdentificationcommonamongvalidatedchangingclearsignificanceinformprovideinsightcausestransitionspaperreportapproachmicroorganismsmodel-basedmicrobiallymodulatedegalgalclinicaloutcomesshiftsetcidentifyinglargedynamiccommunitiesbroadutilityecologicalimpactpublichealthResultswillpromotetheoreticalpracticaladvancementsrelevantbiologyspecificorganismsaddressuniquechallengeposeddiversesparsedevelopedmodelingtime-to-eventCompetingrigorouslygeneralizableModelappliedshowcanpreciselyquantifylikeNovelApplicationModelsPredictingMicrobialCommunityTransitionsVariableEnvironmentalmicrobiologyecologymicrobiome-basedwaterquality

Similar Articles

Cited By