A novel seasonal index-based machine learning approach for air pollution forecasting.

Adeel Khan, Sumit Sharma, Kaushik Roy Chowdhury, Prateek Sharma
Author Information
  1. Adeel Khan: Council On Energy, Environment and Water, New Delhi, 110016, India.
  2. Sumit Sharma: TERI, The Energy and Resources Institute, IHC Complex, Lodi Road, New Delhi, 110003, India. sumit4879@gmail.com.
  3. Kaushik Roy Chowdhury: Accenture Solutions Pvt. Ltd, Gurgaon, Haryana, 122002, India.
  4. Prateek Sharma: TERI School of Advanced Studies, New Delhi, 110070, India.

Abstract

Novel machine learning models (MLMs) using the seasonal indexing approach that captures the variation in air quality caused due to meteorological changes have been used to provide short-term, real-time forecasts of PM concentration for one of the most polluted air quality control regions (AQCR) in the capital city of Delhi. Two MLMs-multi-linear regression and random forest-have been developed for using time series data for 1-h and 24-h average PM concentration. Short-term, real-time forecasts have been made using the developed models. Various model performance evaluation indices indicate satisfactory model performance. R values for the hourly and daily models varied between 0.95 and 0.72 and between 0.76 and 0.68 for the 1st to 5th h/day, respectively. The lagged values of PM concentration (persistence) and the hourly and daily indices are the most influential variables for the forecasts for immediate time steps. In contrast, seasonal indices become more important with the forecasting time horizon. The developed models can be used for making short-term, real-time air quality forecasts and issuing a warning when the pollution levels go beyond acceptable limits.

Keywords

References

  1. Abdulrazzaq, L. R., Abdulkareem, M. N., Yazid, M. R. M., Borhan, M. N., & Mahdi, M. S. (2020). Traffic congestion: Shift from private car to public transportation. Civil Engineering Journal (Iran), 6(8), 1547–1554. https://doi.org/10.28991/cej-2020-03091566
  2. Agarwal, S., Sharma, S., Suresh, R., Rahman, M. H., Vranckx, S., Maiheu, B., Blyth, L., Janssen, S., Gargava, P., Shukla, V. K., & Batra, S. (2020). Air quality forecasting using artificial neural networks with real time dynamic error correction in highly polluted regions. Science of the Total Environment, 735, 139454. https://doi.org/10.1016/j.scitotenv.2020.139454 [DOI: 10.1016/j.scitotenv.2020.139454]
  3. Anfossi, D., Brusasca, G., & Tinarelli, G. (1990). Simulation of atmospheric diffusion in low windspeed meandering conditions by a Monte Carlo dispersion method. Nuovo Cimento, C, 13(6), 995–1006. http://inis.iaea.org/Search/search.aspx?orig_q=RN:23004766
  4. Angelevska, B., Atanasova, V., & Andreevski, I. (2021). Urban air quality guidance based on measures categorization in road transport. Civil Engineering Journal (Iran), 7(2), 253–267. https://doi.org/10.28991/cej-2021-03091651
  5. Arroyo, Á., Herrero, Á., Tricio, V., Corchado, E., & Woźniak, M. (2018) Neural models for imputation of missing ozone data in air-quality datasets. Complexity, 2018. https://doi.org/10.1155/2018/7238015
  6. Bansal, M., Aggarwal, A., & Verma, T. (2019). Air quality index prediction of Delhi using LSTM. International Journal of Emerging Trends & Technology in Computer Science, 8(5), 59–68.
  7. Berrar, D. (2018). Cross-validation. In Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics (Vols. 1–3, pp. 542–545). Elsevier. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
  8. Bhanarkar, A. D., Purohit, P., Rafaj, P., Amann, M., Bertok, I., Cofala, J., Rao, P. S., Vardhan, B. H., Kiesewetter, G., Sander, R., Schöpp, W., Majumdar, D., Srivastava, A., Deshmukh, S., Kawarti, A., & Kumar, R. (2018). Managing future air quality in megacities: Co-benefit assessment for Delhi. Atmospheric Environment, 186, 158–177. https://doi.org/10.1016/j.atmosenv.2018.05.026 [DOI: 10.1016/j.atmosenv.2018.05.026]
  9. Bi, J., Wildani, A., Chang, H. H., & Liu, Y. (2020). Incorporating low-cost sensor measurements into high-resolution PM2.5 modeling at a large spatial scale. Environmental Science and Technology, 54(4), 2152–2162. https://doi.org/10.1021/acs.est.9b06046
  10. Bunn, D. W., & Vassilopoulos, A. I. (1999). Comparison of seasonal estimation methods in multi-item short-term forecasting. International Journal of Forecasting, 15(4), 431–443. https://doi.org/10.1016/S0169-2070(99)00005-9 [DOI: 10.1016/S0169-2070(99)00005-9]
  11. Burnett, R. T., Arden Pope, C., Ezzati, M., Olives, C., Lim, S. S., Mehta, S., Shin, H. H., Singh, G., Hubbell, B., Brauer, M., Ross Anderson, H., Smith, K. R., Balmes, J. R., Bruce, N. G., Kan, H., Laden, F., Prüss-Ustün, A., Turner, M. C., Gapstur, S. M., & Cohen, A. (2014). An integrated risk function for estimating the global burden of disease attributable to ambient fine particulate matter exposure. Environmental Health Perspectives, 122(4), 397–403. https://doi.org/10.1289/ehp.1307049 [DOI: 10.1289/ehp.1307049]
  12. Castelli, M., Clemente, F. M., Popovič, A., Silva, S., & Vanneschi, L. (2020). A machine learning approach to predict air quality in California. Complexity, 2020(Ml). https://doi.org/10.1155/2020/8049504
  13. Cats, G. J., & Holtslag, A. A. M. (1980). Prediction of air pollution frequency distribution—Part I. The lognormal model. Atmospheric Environment (1967), 14(2), 255–258.
  14. Chelani, A. B., & Devotta, S. (2007). Air quality assessment in Delhi: Before and after CNG as fuel. Environmental Monitoring and Assessment, 125(1–3), 257–263. https://doi.org/10.1007/s10661-006-9517-x [DOI: 10.1007/s10661-006-9517-x]
  15. Cheng, Z., Luo, L., Wang, S., Wang, Y., Sharma, S., Shimadera, H., Wang, X., Bressi, M., de Miranda, R. M., Jiang, J., Zhou, W., Fajardo, O., Yan, N., & Hao, J. (2016). Status and characteristics of ambient PM2.5 pollution in global megacities. Environment International, 89–90, 212–221. https://doi.org/10.1016/j.envint.2016.02.003 [DOI: 10.1016/j.envint.2016.02.003]
  16. CPCB. (2017). Graded Response Action Plan for Delhi & NCR. In Govt. of India. https://cpcb.nic.in/uploads/final_graded_table.pdf
  17. Gardner, J. R., Everette, S. (1984). Forecasting: Methods and applications (Second Edition), Makridakis, S., Wheelwright, S. C. and McGee, V. E., New York: Wiley, 1983. Price: $47.85/$20.15 (cloth), $34.15/E14.35 (paper). Pages: 923. Journal of Forecasting, 3(4), 457–460. https://doi.org/10.1002/for.3980030408
  18. Goyal, P., Chan, A. T., & Jaiswal, N. (2006). Statistical models for the prediction of respirable suspended particulate matter in urban cities. Atmospheric Environment, 40(11), 2068–2077. https://doi.org/10.1016/j.atmosenv.2005.11.041 [DOI: 10.1016/j.atmosenv.2005.11.041]
  19. Goyal, P., Gulia, S., Goyal, S. K., & Kumar, R. (2019). Assessment of the effectiveness of policy interventions for air quality control regions in Delhi city. Environmental Science and Pollution Research, 26(30), 30967–30979. https://doi.org/10.1007/s11356-019-06236-1 [DOI: 10.1007/s11356-019-06236-1]
  20. Guo, P. T., Li, M. F., Luo, W., Tang, Q. F., Liu, Z. W., & Lin, Z. M. (2015). Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach. Geoderma, 237–238, 49–59. https://doi.org/10.1016/j.geoderma.2014.08.009 [DOI: 10.1016/j.geoderma.2014.08.009]
  21. Guttikunda, S. K., & Gurjar, B. R. (2012). Role of meteorology in seasonality of air pollution in megacity Delhi. India. Environmental Monitoring and Assessment, 184(5), 3199–3211. https://doi.org/10.1007/s10661-011-2182-8 [DOI: 10.1007/s10661-011-2182-8]
  22. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Springer. [DOI: 10.1007/978-0-387-21606-5]
  23. Ho, T. K. (1995). Random decision forests. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 1, 278–282. https://doi.org/10.1109/ICDAR.1995.598994 [DOI: 10.1109/ICDAR.1995.598994]
  24. Ittig, P. T. (1997). A seasonal index for business. Decision Sciences, 28(2), 335–355. https://doi.org/10.1111/j.1540-5915.1997.tb01314.x [DOI: 10.1111/j.1540-5915.1997.tb01314.x]
  25. Juda, K. (1989). Air pollution modelling. Encyclopedia of Environmental Control Technology, Air Pollution Control, USA: Gulf Publishing Company, 2, 83–134.
  26. Khare, M., & Sharma, P. (2002). Modelling urban vehicle emissions.
  27. Kohavi, R. (2001). A study of cross-validation and bootstrap for accuracy estimation and model selection. 14.
  28. Kumar, A., & Goyal, P. (2011). Forecasting of air quality in Delhi using principal component regression technique. Atmospheric Pollution Research, 2(4), 436–444. https://doi.org/10.5094/APR.2011.050 [DOI: 10.5094/APR.2011.050]
  29. Liang, Y. C., Maimury, Y., Chen, A. H. L., & Juarez, J. R. C. (2020). Machine learning-based prediction of air quality. Applied Sciences (switzerland), 10(24), 1–17. https://doi.org/10.3390/app10249151 [DOI: 10.3390/app10249151]
  30. Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2(3), 18–22.
  31. Martin, M. P., Wattenbach, M., Smith, P., Meersmans, J., Jolivet, C., Boulonne, L., & Arrouays, D. (2011). Spatial distribution of soil organic carbon stocks in France. Biogeosciences, 8(5), 1053–1065. https://doi.org/10.5194/bg-8-1053-2011 [DOI: 10.5194/bg-8-1053-2011]
  32. NOAA. (2001). Air quality forecasting. In NOAA Aeronomy Laboratory (Issue June). https://www.esrl.noaa.gov/csd/AQRS/reports/forecasting.pdf
  33. Pandey, A., Brauer, M., Cropper, M. L., Balakrishnan, K., Mathur, P., Dey, S., Turkgulu, B., Kumar, G. A., Khare, M., Beig, G., Gupta, T., Krishnankutty, R. P., Causey, K., Cohen, A. J., Bhargava, S., Aggarwal, A. N., Agrawal, A., Awasthi, S., Bennitt, F., & Dandona, L. (2021). Health and economic impact of air pollution in the states of India: The Global Burden of Disease Study 2019. The Lancet Planetary Health, 5(1), e25–e38. https://doi.org/10.1016/S2542-5196(20)30298-9 [DOI: 10.1016/S2542-5196(20)30298-9]
  34. Rybarczyk, Y., & Zalakeviciute, R. (2018). Regression models to predict air pollution from affordable data collections. In Machine Learning - Advanced Techniques and Emerging Applications. InTech. https://doi.org/10.5772/intechopen.71848
  35. Sembhi, H., Wooster, M., Zhang, T., Sharma, S., Singh, N., Agarwal, S., Boesch, H., Gupta, S., Misra, A., Tripathi, S. N., Mor, S., & Khaiwal, R. (2020). Post-monsoon air quality degradation across Northern India: Assessing the impact of policy-related shifts in timing and amount of crop residue burnt. Environmental Research Letters, 15(10), 104067. https://doi.org/10.1088/1748-9326/aba714 [DOI: 10.1088/1748-9326/aba714]
  36. Sharma, S., Sharma, P., & Khare, M. (2017). Photo-chemical transport modelling of tropospheric ozone: A review. In Atmospheric Environment (Vol. 159, pp. 34–54). Elsevier Ltd. https://doi.org/10.1016/j.atmosenv.2017.03.047
  37. Srivastava, C., Singh, S., & Singh, A. P. (2019). Estimation of air pollution in Delhi using machine learning techniques. 2018 International Conference on Computing, Power and Communication Technologies, GUCON 2018, 304–309. https://doi.org/10.1109/GUCON.2018.8675022
  38. Wang, D. (2018). BRITS : Bidirectional Recurrent Imputation for Time Series. NeurIPS, 1–11.
  39. Wilkinson, S., Mills, G., Illidge, R., & Davies, W. J. (2012). How is ozone pollution reducing our food supply? Journal of Experimental Botany, 63(2), 527–536. https://doi.org/10.1093/jxb/err317 [DOI: 10.1093/jxb/err317]
  40. World Population Review. (n.d.). Delhi Population 2021 (Demographics, Maps, Graphs). Retrieved June 17, 2021, from https://worldpopulationreview.com/world-cities/delhi-population
  41. Xie, X., Wu, T., Zhu, M., Jiang, G., Xu, Y., Wang, X., & Pu, L. (2021). Comparison of random forest and multiple linear regression models for estimation of soil extracellular enzyme activities in agricultural reclaimed coastal saline land. Ecological Indicators, 120, 106925. https://doi.org/10.1016/j.ecolind.2020.106925 [DOI: 10.1016/j.ecolind.2020.106925]
  42. Zannetti, P. (1989). Simulating short-term, short-range air quality dispersion phenomena. Encyclopedia of Environmental Control Technology, 2, 159–191.
  43. Zhang, H., Wu, P., Yin, A., Yang, X., Zhang, M., & Gao, C. (2017). Prediction of soil organic carbon in an intensively managed reclamation zone of eastern China: A comparison of multiple linear regressions and the random forest model. Science of the Total Environment, 592, 704–713. https://doi.org/10.1016/j.scitotenv.2017.02.146 [DOI: 10.1016/j.scitotenv.2017.02.146]
  44. Zhang, Y., Bocquet, M., Mallet, V., Seigneur, C., & Baklanov, A. (2012). Real-time air quality forecasting, part I: History, techniques, and current status. Atmospheric Environment, 60, 632–655. https://doi.org/10.1016/j.atmosenv.2012.06.031 [DOI: 10.1016/j.atmosenv.2012.06.031]

MeSH Term

Air Pollutants
Air Pollution
Environmental Monitoring
Forecasting
Machine Learning
Particulate Matter
Seasons

Chemicals

Air Pollutants
Particulate Matter

Word Cloud

Created with Highcharts 10.0.0modelsairforecasts0learningusingseasonalqualityreal-timePMconcentrationdevelopedtimeindicesforecastingpollutionmachineapproachusedshort-termDelhimodelperformancevalueshourlydailyNovelMLMsindexingcapturesvariationcausedduemeteorologicalchangesprovideonepollutedcontrolregionsAQCRcapitalcityTwoMLMs-multi-linearregressionrandomforest-haveseriesdata1-h24-haverageShort-termmadeVariousevaluationindicatesatisfactoryRvaried957276681st5thh/dayrespectivelylaggedpersistenceinfluentialvariablesimmediatestepscontrastbecomeimportanthorizoncanmakingissuingwarninglevelsgobeyondacceptablelimitsnovelindex-basedAirCross-validationLow-costMachineRandomforestSeasonalindex

Similar Articles

Cited By