Semi-supervised urban haze pollution prediction based on multi-source heterogeneous data.

Zuhan Liu, Lili Wang
Author Information
  1. Zuhan Liu: School of Information Engineering, Nanchang Institute of Technology, Nanchang, China.
  2. Lili Wang: College of Science, Nanchang Institute of Technology, Nanchang, China.

Abstract

Particulate matter (PM) is defined by the Texas Commission on Environmental Quality (TCEQ) as "a mixture of solid particles and liquid droplets found in the air". These particles vary widely in size. Those particles that are less than 2.5 ��m in aerodynamic diameter are known as Particulate Matter 2.5 or PM. Urban haze pollution represented by PM is becoming serious, so air pollution monitoring is very important. However, due to high cost, the number of air monitoring stations is limited. Our work focuses on integrating multi-source heterogeneous data of Nanchang, China, which includes Taxi track, human mobility, Road networks, Points of Interest (POIs), Meteorology (e.g., temperature, dew point, humidity, wind speed, wind direction, atmospheric pressure, weather activity, weather conditions) and PM forecast data of air monitoring stations. This research presents an innovative approach to air quality prediction by integrating the above data sets from various sources and utilizing diverse architectures in Nanchang City, China. So for that, semi-supervised learning techniques will be used, namely collaborative training algorithm Co-Training (Co-T), who further adjusting algorithm Tri-Training (Tri-T). The objective is to accurately estimate haze pollution by integrating and using these multi-source heterogeneous data. We achieved this for the first time by employing a semi-supervised co-training strategy to accurately estimate pollution levels after applying the U-air system to environmental data. In particular, the algorithm of U-Air system is reproduced on these highly diverse heterogeneous data of Nanchang City, and the semi-supervised learning Co-T and Tri-T are used to conduct more detailed urban haze pollution prediction. Compared with Co-T, which train time classifier (TC) and subspace classifier (SC) respectively from the separated spatio-temporal perspective, the Tri-T is more accurate with a and faster because of its testing accuracy up to 85.62 %. The forecast results also present the potential of the city multi-source heterogeneous data and the effectiveness of the semi-supervised learning. We hope that this synthesis will motivate atmospheric environmental officials, scientists, and environmentalists in China to explore machine learning technology for controlling the discharge of pollutants and environmental management.

Keywords

References

  1. Environ Int. 2020 Oct;143:105748 [PMID: 32629198]
  2. Environ Int. 2016 Jul-Aug;92-93:146-56 [PMID: 27104672]
  3. Sci Total Environ. 2022 Jun 10;824:153834 [PMID: 35157858]
  4. Science. 2008 Feb 8;319(5864):769-71 [PMID: 18258906]
  5. Sci Total Environ. 2022 Jun 25;827:154299 [PMID: 35257774]
  6. Environ Int. 2019 May;126:134-144 [PMID: 30798194]
  7. J Environ Manage. 2016 Dec 1;183(Pt 3):694-702 [PMID: 27641656]
  8. Environ Int. 2022 Feb;160:107066 [PMID: 34974236]
  9. J Environ Sci (China). 2022 Apr;114:503-513 [PMID: 35459512]
  10. Nat Commun. 2022 Sep 1;13(1):5145 [PMID: 36050311]
  11. IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2222-2232 [PMID: 27411231]
  12. Nature. 2015 May 28;521(7553):452-9 [PMID: 26017444]
  13. PLoS One. 2015 Apr 07;10(4):e0121825 [PMID: 25849534]
  14. Environ Int. 2022 Jul;165:107329 [PMID: 35660952]
  15. Atmos Environ (1994). 2013 Aug 1;75:383-392 [PMID: 24015108]
  16. Environ Int. 2023 Feb;172:107752 [PMID: 36709673]
  17. Environ Sci Technol. 2022 Jun 7;56(11):6793-6798 [PMID: 35674469]
  18. Environ Toxicol. 2022 May;37(5):1198-1210 [PMID: 35112795]

Word Cloud

Created with Highcharts 10.0.0datapollutionheterogeneouslearningPMhazeairmulti-sourcepredictionsemi-supervisedparticlesmonitoringintegratingNanchangChinaalgorithmCo-TTri-TenvironmentalParticulate25stationswindatmosphericweatherforecastqualitydiverseCitywillusedaccuratelyestimatetimesystemurbanclassifierSemi-supervisedmatterdefinedTexasCommissionEnvironmentalQualityTCEQ"amixturesolidliquiddropletsfoundair"varywidelysizeless5 ��maerodynamicdiameterknownMatterUrbanrepresentedbecomingseriousimportantHoweverduehighcostnumberlimitedworkfocusesincludesTaxitrackhumanmobilityRoadnetworksPointsInterestPOIsMeteorologyegtemperaturedewpointhumidityspeeddirectionpressureactivityconditionsresearchpresentsinnovativeapproachsetsvarioussourcesutilizingarchitecturestechniquesnamelycollaborativetrainingCo-TrainingadjustingTri-Trainingobjectiveusingachievedfirstemployingco-trainingstrategylevelsapplyingU-airparticularU-AirreproducedhighlyconductdetailedComparedtrainTCsubspaceSCrespectivelyseparatedspatio-temporalperspectiveaccuratefastertestingaccuracy8562 %resultsalsopresentpotentialcityeffectivenesshopesynthesismotivateofficialsscientistsenvironmentalistsexploremachinetechnologycontrollingdischargepollutantsmanagementbasedAirCo-trainingHazePM2Tri-training

Similar Articles

Cited By (1)