Assessing predictability of environmental time series with statistical and machine learning models.

Matthew Bonas, Abhirup Datta, Christopher K Wikle, Edward L Boone, Faten S Alamri, Bhava Vyasa Hari, Indulekha Kavila, Susan J Simmons, Shannon M Jarvis, Wesley S Burr, Daniel E Pagendam, Won Chang, Stefano Castruccio
Author Information
  1. Matthew Bonas: Dept. of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, Indiana, USA. ORCID
  2. Abhirup Datta: Dept. of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA.
  3. Christopher K Wikle: Dept. of Statistics, University of Missouri, Columbia, Missouri, USA. ORCID
  4. Edward L Boone: Dept. of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, Virginia, USA. ORCID
  5. Faten S Alamri: Dept. of Mathematical Sciences, College of Science, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia.
  6. Bhava Vyasa Hari: Wipro Limited, Bengaluru, India.
  7. Indulekha Kavila: School of Pure and Applied Physics, Mahatma Gandhi University, Kottayam, India. ORCID
  8. Susan J Simmons: Institute for Advanced Analytics, North Carolina State University, Raleigh, North Carolina, USA.
  9. Shannon M Jarvis: Dept. of Mathematics, Trent University, Peterborough, Ontario, Canada. ORCID
  10. Wesley S Burr: Dept. of Mathematics, Trent University, Peterborough, Ontario, Canada. ORCID
  11. Daniel E Pagendam: CSIRO Data61, Eveleigh, Brisbane, Australia.
  12. Won Chang: Div. of Statistics and Data Science, University of Cincinnati, Cincinnati, Ohio, USA. ORCID
  13. Stefano Castruccio: Dept. of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, Indiana, USA. ORCID

Abstract

The ever increasing popularity of machine learning methods in virtually all areas of science, engineering and beyond is poised to put established statistical modeling approaches into question. Environmental statistics is no exception, as popular constructs such as neural networks and decision trees are now routinely used to provide forecasts of physical processes ranging from air pollution to meteorology. This presents both challenges and opportunities to the statistical community, which could contribute to the machine learning literature with a model-based approach with formal uncertainty quantification. Should, however, classical statistical methodologies be discarded altogether in environmental statistics, and should our contribution be focused on formalizing machine learning constructs? This work aims at providing some answers to this thought-provoking question with two time series case studies where selected models from both the statistical and machine learning literature are compared in terms of forecasting skills, uncertainty quantification and computational time. Relative merits of both class of approaches are discussed, and broad open questions are formulated as a baseline for a discussion on the topic.

Keywords

References

  1. Lancet Planet Health. 2020 Oct;4(10):e474-e482 [PMID: 32976757]
  2. Atmos Environ (1994). 2020 Feb 1;222: [PMID: 32863727]
  3. Neural Comput. 1997 Nov 15;9(8):1735-80 [PMID: 9377276]
  4. J Expo Sci Environ Epidemiol. 2022 Nov;32(6):908-916 [PMID: 36352094]
  5. J Open Source Softw. 2022;7(71): [PMID: 37077317]
  6. IEEE Trans Neural Netw. 1994;5(2):157-66 [PMID: 18267787]
  7. Environmetrics. 2023 Feb;34(1): [PMID: 37200542]
  8. Stat Med. 2016 Jul 20;35(16):2741-53 [PMID: 26854022]
  9. J Expo Sci Environ Epidemiol. 2021 Feb;31(1):170-176 [PMID: 32719441]
  10. Science. 1990 May 18;248(4957):904-5 [PMID: 17811864]
  11. Proc Mach Learn Res. 2022 Aug;182:224-248 [PMID: 37706207]
  12. Entropy (Basel). 2019 Feb 15;21(2): [PMID: 33266899]

Grants

  1. R01 ES033739/NIEHS NIH HHS

Word Cloud

Created with Highcharts 10.0.0machinelearningstatisticaltimeuncertaintyenvironmentalseriesmodelingapproachesquestionstatisticsliteraturequantificationmodelsforecastingeverincreasingpopularitymethodsvirtuallyareasscienceengineeringbeyondpoisedputestablishedEnvironmentalexceptionpopularconstructsneuralnetworksdecisiontreesnowroutinelyusedprovideforecastsphysicalprocessesrangingairpollutionmeteorologypresentschallengesopportunitiescommunitycontributemodel-basedapproachformalhoweverclassicalmethodologiesdiscardedaltogethercontributionfocusedformalizingconstructs?workaimsprovidinganswersthought-provokingtwocasestudiesselectedcomparedtermsskillscomputationalRelativemeritsclassdiscussedbroadopenquestionsformulatedbaselinediscussiontopicAssessingpredictability

Similar Articles

Cited By

No available data.