Data-driven interpretable analysis for polysaccharide yield prediction.

Yushi Tian, Xu Yang, Nianhua Chen, Chunyan Li, Wulin Yang
Author Information
  1. Yushi Tian: School of Resource and Environment, Northeast Agriculture University, Harbin, 150030, PR China.
  2. Xu Yang: School of Resource and Environment, Northeast Agriculture University, Harbin, 150030, PR China.
  3. Nianhua Chen: School of Resource and Environment, Northeast Agriculture University, Harbin, 150030, PR China.
  4. Chunyan Li: School of Resource and Environment, Northeast Agriculture University, Harbin, 150030, PR China.
  5. Wulin Yang: College of Environmental Sciences and Engineering, Peking University, Beijing, 100871, PR China.

Abstract

Cornstalks show promise as a raw material for polysaccharide production through xylanase. Rapid and accurate prediction of polysaccharide yield can facilitate process optimization, eliminating the need for extensive experimentation in actual production to refine reaction conditions, thereby saving time and costs. However, the intricate interplay of enzymatic factors poses challenges in predicting and optimizing polysaccharide yield accurately. Here, we introduce an innovative data-driven approach leveraging multiple artificial intelligence techniques to enhance polysaccharide production. We propose a machine learning framework to identify highly accurate polysaccharide yield prediction modeling methods and uncover optimal enzymatic parameter combinations. Notably, Random Forest (RF) and eXtreme Gradient Boost (XGB) demonstrate robust performance, achieving prediction accuracies of 93.0% and 95.6%, respectively, while an independently developed deep neural network (DNN) model achieves 91.1% accuracy. A feature importance analysis of XGB reveals the enzyme solution volume's dominant role (43.7%), followed by time (20.7%), substrate concentration (15%), temperature (15%), and pH (5.6%). Further interpretability analysis unveils complex parameter interactions and potential optimization strategies. This data-driven approach, incorporating machine learning, deep learning, and interpretable analysis, offers a viable pathway for polysaccharide yield prediction and the potential recovery of various agricultural residues.

Keywords

References

  1. Front Environ Sci Eng. 2023;17(1):8 [PMID: 36061489]
  2. Mol Ecol. 2017 Sep;26(17):4562-4573 [PMID: 28665011]
  3. Environ Sci Technol. 2021 Sep 7;55(17):11925-11936 [PMID: 34291911]
  4. Water Res. 2022 Sep 1;223:118975 [PMID: 35987034]
  5. Bioresour Technol. 2012 May;112:199-205 [PMID: 22414575]
  6. Brief Bioinform. 2021 Mar 22;22(2):1592-1603 [PMID: 33569575]
  7. Bioresour Technol. 2020 May;304:122999 [PMID: 32087543]
  8. Ecotoxicol Environ Saf. 2022 Mar 15;233:113332 [PMID: 35219256]
  9. Small. 2021 Oct;17(42):e2102972 [PMID: 34524736]
  10. Nat Methods. 2015 Oct;12(10):931-4 [PMID: 26301843]
  11. Bioengineered. 2020 Dec;11(1):251-260 [PMID: 32125259]
  12. Nature. 2018 Jul;559(7715):547-555 [PMID: 30046072]
  13. J Agric Food Chem. 2010 Mar 24;58(6):3632-41 [PMID: 20163183]
  14. Environ Sci Ecotechnol. 2022 Dec 30;14:100233 [PMID: 36793396]
  15. Drug Discov Today. 2021 Mar;26(3):769-777 [PMID: 33290820]
  16. Environ Sci Ecotechnol. 2022 Sep 24;13:100207 [PMID: 36203649]
  17. Bioresour Technol. 2013 Jan;127:236-41 [PMID: 23131647]
  18. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34571535]
  19. Adv Mater. 2019 Nov;31(46):e1902765 [PMID: 31486179]
  20. Drug Discov Today. 2017 Nov;22(11):1680-1685 [PMID: 28881183]
  21. Bioresour Technol. 2023 Jan;367:128277 [PMID: 36356846]
  22. Front Physiol. 2015 Aug 07;6:216 [PMID: 26300782]

Word Cloud

Created with Highcharts 10.0.0polysaccharidepredictionyieldlearninganalysisproductionaccurateoptimizationtimeenzymaticdata-drivenapproachmachineparameterXGB6%deep7%15%interpretabilitypotentialinterpretableCornstalksshowpromiserawmaterialxylanaseRapidcanfacilitateprocesseliminatingneedextensiveexperimentationactualrefinereactionconditionstherebysavingcostsHoweverintricateinterplayfactorsposeschallengespredictingoptimizingaccuratelyintroduceinnovativeleveragingmultipleartificialintelligencetechniquesenhanceproposeframeworkidentifyhighlymodelingmethodsuncoveroptimalcombinationsNotablyRandomForestRFeXtremeGradientBoostdemonstraterobustperformanceachievingaccuracies930%95respectivelyindependentlydevelopedneuralnetworkDNNmodelachieves911%accuracyfeatureimportancerevealsenzymesolutionvolume'sdominantrole43followed20substrateconcentrationtemperaturepH5unveilscomplexinteractionsstrategiesincorporatingoffersviablepathwayrecoveryvariousagriculturalresiduesData-drivenCornstalkMachineModelPolysaccharideXylanase

Similar Articles

Cited By