Evaluation of machine learning and deep learning models for daily air quality index prediction in Delhi city, India.

Chaitanya Baliram Pande, Latha Radhadevi, Murthy Bandaru Satyanarayana
Author Information
  1. Chaitanya Baliram Pande: Indian Institute of Tropical Meteorology, NCL Post, Dr. Homi Bhabha Road, Pune, 411008, India. chaitanay45@gmail.com.
  2. Latha Radhadevi: Indian Institute of Tropical Meteorology, NCL Post, Dr. Homi Bhabha Road, Pune, 411008, India.
  3. Murthy Bandaru Satyanarayana: Indian Institute of Tropical Meteorology, NCL Post, Dr. Homi Bhabha Road, Pune, 411008, India.

Abstract

The air quality index (AQI), based on criteria for air contaminants, is defined to provide a shared vision of air quality. As air pollution continues to rise in global cities due to urbanization and climate change, air pollution monitoring and forecasting models for effective air quality monitoring that gather and forecast information about air pollution concentration are essential in every city. Air quality predictions have evolved to be more helpful for management. Recently, better performance and ability have developed due to the involvement of machine learning (ML) and artificial intelligence (AI) in forecasting air quality in urban cities in India. This paper focuses on air pollution as a significant ecological problem that directly impacts human health and the distribution of an environmental system in urban areas. Hence, we have developed advanced models for daily AQI forecasting to understand the air effluence level in the upcoming days. In this research, six data-driven models have been developed and implemented for daily AQI forecasting in the study area; it is crucial for understanding the future air pollution levels to plan and control air pollution in the entire city. The developed model is applied to air quality datasets. A comparison of the performance of ML models tested here indicates that the XGBoost algorithm achieves the highest coefficient of determination (R) and root-mean-square deviation (RMSE) value of 0.99 and lower values value of 4.65 than other models in the testing phase. The results of the artificial neural network (ANN) algorithm are slightly lower than the extreme gradient boosting (XGBoost model); the ANN model results are as R, mean squared error (MSE), and RMSE values of 0.99, 13.99, and 198.88, respectively. All the models were subjected to a ten-fold cross-validation model. However, the RF cross-validation model outperforms other models; the RF model result shows the R, RMSE, and MSE values of 0.99, 3.64, and 4.12, respectively. This study also employed two interpretable models, namely feature importance analysis and Shapley additive explanation (SHAP), to evaluate both the global and local methods in a manner that is independent of specific ML models. The feature importance shows that particle matter (PM) 2.5, PM10, carbon monoxide (CO), and nitrogen oxides (NO) were the most influential variables. The results determined that such novel DL and ML models may improve the accuracy of AQI forecasts and understanding of air pollution, particularly in metropolitan cities.

Keywords

References

  1. Abbas, T. R., & Abbas, R. R. (2021). Assessing health impact of air pollutants in five Iraqi cities using AirQ+ model. IOP Conference Series: Materials Science and Engineering, 1094(1), 012006. [DOI: 10.1088/1757-899X/1094/1/012006]
  2. Analitis, A., Barratt, B., Green, D., Beddows, A., Samoli, E., Schwartz, J., & Katsouyanni, K. (2020). Prediction of PM2. 5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London. Atmospheric Environment, 240, 117757. [DOI: 10.1016/j.atmosenv.2020.117757]
  3. Balogun, A.-L., & Tella, A. (2022). Modelling and investigating the impacts of climatic variables on ozone concentration in Malaysia using correlation analysis with random forest, decision tree regression, linear regression, and support vector regression. Chemosphere, 299, 134250. [DOI: 10.1016/j.chemosphere.2022.134250]
  4. Balram, D., Lian, K.-Y., & Sebastian, N. (2019). (2019) Air quality warning system based on a localized PM2.5 soft sensor using a novel approach of Bayesian regularized neural network via forward feature selection. Ecotoxicology and Environmental Safety, 182, 109386. [DOI: 10.1016/j.ecoenv.2019.109386]
  5. Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324 [DOI: 10.1023/A]
  6. Brunekreef, B. (2007). Health effects of air pollution observed in cohort studies in Europe. Journal of Exposure Science & Environmental Epidemiology, 17, 61–65. [DOI: 10.1038/sj.jes.7500628]
  7. Chauhan, R., Kaur, H., & Alankar, B. (2021). Air quality forecast using convolutional neural network for sustainable development in urban environments. Sustainable Cities and Society, 75, 103239. [DOI: 10.1016/j.scs.2021.103239]
  8. Chelani, A. B., Rao, C. C., Phadke, K. M., & Hasan, M. Z. (2002). Formation of an air quality index in India. International Journal of Environmental Studies, 59(3), 331–342. [DOI: 10.1080/00207230211300]
  9. Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peer J Comput Sci, 7, e623. [DOI: 10.7717/peerj-cs.623]
  10. Choubin, et al. (2020). Spatial hazard assessment of the PM10 using machine learning models in Barcelona, Spain. Science of The Total Environment, 701, 134474. [DOI: 10.1016/j.scitotenv.2019.134474]
  11. Christopher, C. (2009). The immune effects of naturally occurring and synthetic nanoparticles. Journal of Autoimmunity, 34(3), J234–J246.
  12. Chu, J., Dong, Y., Han, X., Xie, J., Xu, X., & Xie, G. (2021). Short-term prediction of urban PM2. 5 based on a hybrid modified variational mode decomposition and support vector regression model. Environmental Science and Pollution Research, 28, 56–72. [DOI: 10.1007/s11356-020-11065-8]
  13. Deswal, S., & Verma, V. (2016). Annual and seasonal variations in air quality index of the national capital region, India. International Journal of Environmental and Ecological Engineering, 10(10), 1000–1005.
  14. El Bilali, A., et al. (2023). An interpretable machine learning approach based on DNN, SVR, Extra Tree, and XG-Boost models for predicting daily pan evaporation. Journal of Environmental Management, 327, 116890. [DOI: 10.1016/j.jenvman.2022.116890]
  15. Eslami, E., Choi, Y., Lops, Y., & Sayeed, A. (2020). A real-time hourly ozone prediction system using deep convolutional neural network. Neural Computing and Applications, 32(13), 8783–8797. [DOI: 10.1007/s00521-019-04282-x]
  16. Espinosa, R., Jiménez, F., & Palma, J. (2022). Multi-objective evolutionary spatio-temporal forecasting of air pollution. Future Generation Computer Systems, 136, 15–33. [DOI: 10.1016/j.future.2022.05.020]
  17. Fan, S., Hazell, P. B., Thorat, S. (1999). Linkages between government spending, growth, and poverty in rural India, vol 110. International Food Policy Research Institute, Washington, DC.
  18. Goudarzi, G., Hopke, P. K., & Yazdani, M. (2021). Forecasting PM2. 5 concentration using artificial neural network and its health effects in Ahvaz, Iran. Chemosphere, 283, 131285. [DOI: 10.1016/j.chemosphere.2021.131285]
  19. Guo, Q., & He, Z. (2021). Prediction of the confirmed cases and deaths of global COVID-19 using artificial intelligence. Environmental Science and Pollution Research, 28, 11672–11682. https://doi.org/10.1007/s11356-020-11930-6 [DOI: 10.1007/s11356-020-11930-6]
  20. Guo, Q., He, Z., Li, S., Li, X., Meng, J., Hou, Z., Liu, J., & Chen, Y. (2020). Air pollution forecasting using artificial and wavelet neural networks with meteorological conditions. Aerosol and Air Quality Research, 20, 1429–1439. https://doi.org/10.4209/aaqr.2020.03.0097 [DOI: 10.4209/aaqr.2020.03.0097]
  21. Guo, Q., Wang, Z., He, Z., Li, X., Meng, J., Hou, Z., & Yang, J. (2021). Changes in air quality from the COVID to the post-COVID era in the Beijing-Tianjin-Tangshan Region in China. Aerosol and Air Quality Research, 21, 210270. https://doi.org/10.4209/aaqr.210270 [DOI: 10.4209/aaqr.210270]
  22. Guo, Q., He, Z., & Wang, Z. \. (2023a). Predicting of daily PM2.5 concentration employing wavelet artificial neural networks based on meteorological elements in Shanghai, China. Toxics, 11(1), 51. https://doi.org/10.3390/toxics11010051 [DOI: 10.3390/toxics11010051]
  23. Guo, Q., He, Z., & Wang, Z. (2023b). Change in air quality during 2014–2021 in Jinan City in China and its influencing factors. Toxics, 11, 210. https://doi.org/10.3390/toxics11030210 [DOI: 10.3390/toxics11030210]
  24. Guo, Q., He, Z., & Wang, Z. (2023c). Prediction of hourly PM2.5 and PM10 concentrations in Chongqing City in China based on artificial neural network. Aerosol and Air Quality Research, 23, 220448. https://doi.org/10.4209/aaqr.220448 [DOI: 10.4209/aaqr.220448]
  25. Guo, Q., He, Z., & Wang, Z. (2023d). Simulating daily PM25 concentrations using wavelet analysis and artificial neural network with remote sensing and surface observation data. Chemosphere, 340, 139886. https://doi.org/10.1016/j.chemosphere.2023.139886 [DOI: 10.1016/j.chemosphere.2023.139886]
  26. Guo, Z. Q., Zhang, J. Q., Zhang, W. W., Zhao, B., Jiang, Y. Q., Wang, S. X., ... & Nie, J. Y. (2024). Air quality and heath co-benefits of low carbon transition policies in electricity system: the case of Beijing–Tianjin–Hebei region. Environmental Research Letters, 19(5), 054039.
  27. He, B. J., Ding, L., & Prasad, D. (2019). Enhancing urban ventilation performance through the development of precinct ventilation zones: A case study based on the greater Sydney, Australia. Sustainable Cities and Society, 47, 101472. [DOI: 10.1016/j.scs.2019.101472]
  28. He, Z., Guo, Q., Wang, Z., & Li, X. (2022). Prediction of monthly PM2.5 concentration in Liaocheng in China employing artificial neural network. Atmosphere, 13(8), 1221. https://doi.org/10.3390/atmos13081221 [DOI: 10.3390/atmos13081221]
  29. Huang, H.-C., et al. (2019). Association between chronic obstructive pulmonary disease and PM2. 5 in Taiwanese nonsmokers. International Journal of Hygiene and Environmental Health, 222, 884–888. [DOI: 10.1016/j.ijheh.2019.03.009]
  30. Jamei, M., Ali, M., Karbasi, M., Xiang, Y., Ahmadianfar, I., & Yaseen, Z. M. (2022). Designing a multi-stage expert system for daily ocean wave energy forecasting: A multivariate data decomposition-based approach. Applied Energy, 326, 119925. https://doi.org/10.1016/j.apenergy.2022.119925
  31. Kim, H. S., et al. (2019). Development of a daily PM10 and PM2.5 prediction system using a deep long short-term memory neural network model. Atmospheric Chemistry and Physics, 19(20), 12935–12951. https://doi.org/10.5194/acp-19-12935-2019 [DOI: 10.5194/acp-19-12935-2019]
  32. Kim, D., Han, H., Wang, W., Kang, Y., Lee, H., & Kim, H. S. (2022). Application of deep learning models and network method for comprehensive air-quality index prediction. Applied Sciences, 12, 6699. [DOI: 10.3390/app12136699]
  33. Kovacs, B., Caplan, N., Grob, S., & King, M. (2021). Social networks and loneliness during the COVID-19 pandemic. Socius, 7. https://doi.org/10.1177/2378023120985254
  34. Kyrkilis, G., Chaloulakou, A., & Kassomenos, P. A. (2007). Development of an aggregate air quality index for an urban Mediterranean agglomeration: Relation to potential health effects. Environment International, 33(5), 670–676. [DOI: 10.1016/j.envint.2007.01.010]
  35. Leong, W. C., et al. (2020). Prediction of air pollution index (API) using support vector machine (SVM). Journal of Environmental Chemical Engineering, 8(3), 103208. [DOI: 10.1016/j.jece.2019.103208]
  36. Li, J. M., Suvarna, L., Pan, Y., & Zhao, X. Wang. (2021). A hybrid data-driven and mechanistic modelling approach for hydrothermal gasification. Applied Energy, 304, 117674. https://doi.org/10.1016/j.apenergy.2021.117674 [DOI: 10.1016/j.apenergy.2021.117674]
  37. Liu, W. L., Xu, Z. P., & Yang, T. A. (2018). Health effects of air pollution in China. International Journal of Environmental Research and Public Health, 15, 1471. [DOI: 10.3390/ijerph15071471]
  38. Luna, A., Paredes, M., De Oliveira, G., & Corrˆea, S. (2014). Prediction of ozone concentration in tropospheric levels using artificial neural networks and support vector machine at Rio de Janeiro, Brazil. Atmospheric Environment, 98, 98–104. [DOI: 10.1016/j.atmosenv.2014.08.060]
  39. Malakouti, S. M. (2023). Improving the prediction of wind speed and power production of SCADA system with ensemble method and 10-fold cross-validation. Case Studies in Chemical and Environmental Engineering, 8, 100351. [DOI: 10.1016/j.cscee.2023.100351]
  40. Menares, C., Perez, P., Parraguez, S., & Fleming, Z. L. (2021). Forecasting PM2.5 levels in Santiago de Chile using deep learning neural networks. Urban Clim., 38, 100906. [DOI: 10.1016/j.uclim.2021.100906]
  41. Ministry of Earth Science, Govt. of India. Indian Institute of Tropical Meteorology, (2022). Ministry of Earth Science, Govt. of India. Indian Institute of Tropical Meteorology P.
  42. Mishra, M. (2019). Poison in the air: Declining air quality in India. Lung India: Official Organ of Indian Chest Society, 36(2), 160. [DOI: 10.4103/lungindia.lungindia_17_18]
  43. Mohammad Ali, N. F. H., Megat Hanafiah, M. A. K., Saleh, S. H., Mohd Ali, M. T., & Ibrahim, S. (2024). A review of biomass-based natural coagulants for water pollution remediation: Impact of properties and coagulation operational parameters. AUIQ Complementary Biological System, 1(2), 31–45.  https://doi.org/10.70176/3007-973X.1013
  44. Murdoch, W. J., et al. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(2019), 22071–22080. [DOI: 10.1073/pnas.1900654116]
  45. Navares, R., & Aznarte, J. L. (2020). Predicting air quality with deep learning LSTM: Towards comprehensive models. Ecological Informatics, 55, 101019.
  46. Ojagh, S., Cauteruccio, F., Terracina, G., & Liang, S. H. (2021). Enhanced air quality prediction by edge-based spatiotemporal data preprocessing. Computers & Electrical Engineering, 96, 107572. https://doi.org/10.1016/j.compeleceng.2021.107572 [DOI: 10.1016/j.compeleceng.2021.107572]
  47. SAFAR-India (system of air quality and weather forecasting and research) (2022), URL: http://safar.tropmet.res.in/
  48. Sarkodie, S. A., Ahmed, M. Y., & Owusu, P. A. (2021). Ambient air pollution and meteorological factors escalate electricity consumption. Science of The Total Environment, 795, 148841. [DOI: 10.1016/j.scitotenv.2021.148841]
  49. Song, Y. Y., & Ying, L. U. (2015). Decision tree methods: Applications for classification and prediction. Shanghai Archives of Psychiatry, 27(2), 130–135.
  50. Thongthammachart, T., Araki, S., Shimadera, H., Matsuo, T., & Kondo, A. (2022). Incorporating light gradient boosting machine to land use regression model for estimating NO2 and PM2. 5 Levels in Kansai Region, Japan. Environmental Modelling & Software, 155, 105447. https://doi.org/10.1016/j.envsoft.2022.105447 [DOI: 10.1016/j.envsoft.2022.105447]
  51. Turky, S. N., et al. (2021). (2021) Deep learning based on different methods for text summary: A survey. Journal of Al-Qadisiyah for computer science and mathematics, 13(1), 26. [DOI: 10.29304/jqcm.2021.13.1.766]
  52. Voukantsis, D., Karatzas, K., Kukkonen, J., Räsänen, T., Karppinen, A., & Kolehmainen, M. (2011). Intercomparison of air quality data using principal component analysis, and forecasting of PM10 and PM2. 5 concentrations using artificial neural networks, in Thessaloniki and Helsinki. Science of the Total Environment, 409, 1266–1276. [DOI: 10.1016/j.scitotenv.2010.12.039]
  53. Wang, Y., Pan, Z., Zheng, J., Qian, L., & Li, M. (2019). A hybrid ensemble method for pulsar candidate classification. Springer Link. [DOI: 10.1007/s10509-019-3602-4]
  54. Wen, C., et al. (2019). A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Science of The Total Environment, 654(2019), 1091–1099. [DOI: 10.1016/j.scitotenv.2018.11.086]
  55. Yu, C., & Yao, W. (2017). (2017) Robust linear regression: A review and comparison. Communications in Statistics - Simulation and Computation, 46(8), 6261–6282. [DOI: 10.1080/03610918.2016.1202271]
  56. Zhu, S., Lian, X., Liu, H., Hu, J., Wang, Y., & Che, J. (2017). Daily air quality index forecasting with hybrid models: A case in China. Environmental Pollution, 231, 1232–1244. [DOI: 10.1016/j.envpol.2017.08.069]
  57. Zhu, L.-T., et al. (2022). Review of machine learning for hydrodynamics, transport, and reactions in multiphase flows and reactors. Industrial & Engineering Chemistry Research, 61(2022), 9901–9949. [DOI: 10.1021/acs.iecr.2c01036]

MeSH Term

India
Air Pollution
Cities
Machine Learning
Air Pollutants
Environmental Monitoring
Deep Learning
Particulate Matter
Forecasting

Chemicals

Air Pollutants
Particulate Matter

Word Cloud

Created with Highcharts 10.0.0airmodelspollutionqualitymodelAQIforecastingdevelopedML99citiescitylearningdailyRRMSE0valuesresultsANNindexglobalduemonitoringAirperformancemachineartificialurbanIndiastudyunderstandingXGBoostalgorithmvaluelower4gradientboostingMSErespectivelycross-validationRFshowsfeatureimportanceSHAPbasedcriteriacontaminantsdefinedprovidesharedvisioncontinuesriseurbanizationclimatechangeeffectivegatherforecastinformationconcentrationessentialeverypredictionsevolvedhelpfulmanagementRecentlybetterabilityinvolvementintelligenceAIpaperfocusessignificantecologicalproblemdirectlyimpactshumanhealthdistributionenvironmentalsystemareasHenceadvancedunderstandeffluencelevelupcomingdaysresearchsixdata-drivenimplementedareacrucialfuturelevelsplancontrolentireapplieddatasetscomparisontestedindicatesachieveshighestcoefficientdeterminationroot-mean-squaredeviation65testingphaseneuralnetworkslightlyextrememeansquarederror1319888subjectedten-foldHoweveroutperformsresult36412alsoemployedtwointerpretablenamelyanalysisShapleyadditiveexplanationevaluatelocalmethodsmannerindependentspecificparticlematterPM25PM10carbonmonoxideCOnitrogenoxidesNOinfluentialvariablesdeterminednovelDLmayimproveaccuracyforecastsparticularlymetropolitanEvaluationdeeppredictionDelhiCross-validationExtrememethod

Similar Articles

Cited By