Integrating Internet multisource big data to predict the occurrence and development of COVID-19 cryptic transmission.

Chengcheng Gao, Rui Zhang, Xicheng Chen, Tianhua Yao, Qiuyue Song, Wei Ye, PengPeng Li, Zhenyan Wang, Dong Yi, Yazhou Wu
Author Information
  1. Chengcheng Gao: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China. ORCID
  2. Rui Zhang: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
  3. Xicheng Chen: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
  4. Tianhua Yao: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
  5. Qiuyue Song: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
  6. Wei Ye: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
  7. PengPeng Li: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
  8. Zhenyan Wang: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
  9. Dong Yi: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China. yd_house@hotmail.com.
  10. Yazhou Wu: Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China. asiawu@tmmu.edu.cn. ORCID

Abstract

With the recent prevalence of COVID-19, cryptic transmission is worthy of attention and research. Early perception of the occurrence and development risk of cryptic transmission is an important part of controlling the spread of COVID-19. Previous relevant studies have limited data sources, and no effective analysis has been carried out on the occurrence and development of cryptic transmission. Hence, we collect Internet multisource big data (including retrieval, migration, and media data) and propose comprehensive and relative application strategies to eliminate the impact of national and media data. We use statistical classification and regression to construct an early warning model for occurrence and development. Under the guidance of the improved coronavirus herd immunity optimizer (ICHIO), we construct a "sampling-feature-hyperparameter-weight" synchronous optimization strategy. In occurrence warning, we propose an undersampling synchronous evolutionary ensemble (USEE); in development warning, we propose a bootstrap-sampling synchronous evolutionary ensemble (BSEE). Regarding the internal training data (Heilongjiang Province), the ROC-AUC of USEE3 incorporating multisource data is 0.9553, the PR-AUC is 0.8327, and the R of BSEE2 fused by the "nonlinear + linear" method is 0.8698. Regarding the external validation data (Shaanxi Province), the ROC-AUC and PR-AUC values of USEE3 were 0.9680 and 0.9548, respectively, and the R of BSEE2 was 0.8255. Our method has good accuracy and generalization and can be flexibly used in the prediction of cryptic transmission in various regions. We propose strategy research that integrates multiple early warning tasks based on multisource Internet big data and combines multiple ensemble models. It is an extension of the research in the field of traditional infectious disease monitoring and has important practical significance and innovative theoretical value.

References

  1. Neural Comput Appl. 2021;33(10):5011-5042 [PMID: 32874019]
  2. BMC Public Health. 2021 Aug 21;21(1):1575 [PMID: 34416859]
  3. Open Forum Infect Dis. 2021 Jan 19;8(2):ofab027 [PMID: 33634204]
  4. Front Psychol. 2021 Jul 20;12:708537 [PMID: 34354650]
  5. Lancet Reg Health West Pac. 2021 Feb;7:100104 [PMID: 33615284]
  6. Nature. 2021 Dec;600(7887):127-132 [PMID: 34695837]
  7. IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):3005-3019 [PMID: 32735538]
  8. J Med Internet Res. 2021 Jul 6;23(7):e27044 [PMID: 34255692]
  9. Viruses. 2021 Apr 01;13(4): [PMID: 33916205]
  10. Lancet Digit Health. 2021 Jun;3(6):e360-e370 [PMID: 34045002]
  11. Accid Anal Prev. 2020 Sep;144:105610 [PMID: 32559659]
  12. NPJ Digit Med. 2021 Feb 11;4(1):22 [PMID: 33574582]
  13. Int J Biometeorol. 2021 Dec;65(12):2203-2214 [PMID: 34075475]
  14. Comput Math Methods Med. 2021 May 17;2021:6662420 [PMID: 34055041]
  15. Int J Neurosci. 2022 Oct;132(10):963-974 [PMID: 33272081]
  16. PeerJ Comput Sci. 2021 Jul 5;7:e623 [PMID: 34307865]
  17. Expert Syst Appl. 2022 Jan;187:115914 [PMID: 34566274]
  18. J Contam Hydrol. 2019 Jan;220:18-25 [PMID: 30473396]
  19. Epidemiol Infect. 2017 Apr;145(6):1118-1129 [PMID: 28115032]
  20. Sci Rep. 2021 Mar 24;11(1):6713 [PMID: 33762599]
  21. Sensors (Basel). 2021 Jul 07;21(14): [PMID: 34300403]
  22. Nature. 2013 Feb 14;494(7436):155-6 [PMID: 23407515]
  23. Pattern Anal Appl. 2021;24(3):1249-1274 [PMID: 34002110]
  24. Adv Exp Med Biol. 2010;680:559-64 [PMID: 20865540]
  25. BMC Public Health. 2021 Jan 21;21(1):100 [PMID: 33472589]
  26. J Clin Microbiol. 2021 Jul 19;59(8):e0007921 [PMID: 33952598]
  27. Sci Rep. 2020 Mar 16;10(1):4747 [PMID: 32179780]
  28. Knowl Based Syst. 2022 Jan 10;235:107629 [PMID: 34728909]
  29. Emerg Microbes Infect. 2021 Dec;10(1):507-535 [PMID: 33666147]
  30. ACS Synth Biol. 2021 Sep 17;10(9):2318-2330 [PMID: 34431290]
  31. J Infect Dis. 2016 Dec 1;214(suppl_4):S380-S385 [PMID: 28830112]
  32. Science. 2014 Mar 14;343(6176):1203-5 [PMID: 24626916]
  33. PLoS Comput Biol. 2015 Oct 29;11(10):e1004513 [PMID: 26513245]
  34. Cell. 2020 Apr 16;181(2):223-227 [PMID: 32220310]
  35. Science. 2020 Oct 30;370(6516):571-575 [PMID: 32913002]
  36. NPJ Digit Med. 2021 Mar 16;4(1):51 [PMID: 33727636]
  37. IEEE Trans Neural Syst Rehabil Eng. 2020 Feb;28(2):390-398 [PMID: 31944960]

Grants

  1. No. 81872716/National Natural Science Foundation of China (National Science Foundation of China)
  2. No. 82173621/National Natural Science Foundation of China (National Science Foundation of China)
  3. Cstc2020jcyj-zdxmX0017/Natural Science Foundation of Chongqing (Natural Science Foundation of Chongqing Municipality)

Word Cloud

Created with Highcharts 10.0.0data0cryptictransmissionoccurrencedevelopmentmultisourceproposewarningCOVID-19researchInternetbigsynchronousensembleimportantmediaconstructearlystrategyevolutionaryRegardingProvinceROC-AUCUSEE3PR-AUCRBSEE2methodmultiplerecentprevalenceworthyattentionEarlyperceptionriskpartcontrollingspreadPreviousrelevantstudieslimitedsourceseffectiveanalysiscarriedHencecollectincludingretrievalmigrationcomprehensiverelativeapplicationstrategieseliminateimpactnationalusestatisticalclassificationregressionmodelguidanceimprovedcoronavirusherdimmunityoptimizerICHIO"sampling-feature-hyperparameter-weight"optimizationundersamplingUSEEbootstrap-samplingBSEEinternaltrainingHeilongjiangincorporating95538327fused"nonlinear+linear"8698externalvalidationShaanxivalues96809548respectively8255goodaccuracygeneralizationcanflexiblyusedpredictionvariousregionsintegratestasksbasedcombinesmodelsextensionfieldtraditionalinfectiousdiseasemonitoringpracticalsignificanceinnovativetheoreticalvalueIntegratingpredict

Similar Articles

Cited By