Machine Learning Early Detection of SARS-CoV-2 High-Risk Variants.

Lun Li, Cuiping Li, Na Li, Dong Zou, Wenming Zhao, Hong Luo, Yongbiao Xue, Zhang Zhang, Yiming Bao, Shuhui Song
Author Information
  1. Lun Li: China National Center for Bioinformation, Beijing, 100101, China. ORCID
  2. Cuiping Li: China National Center for Bioinformation, Beijing, 100101, China.
  3. Na Li: China National Center for Bioinformation, Beijing, 100101, China.
  4. Dong Zou: China National Center for Bioinformation, Beijing, 100101, China.
  5. Wenming Zhao: China National Center for Bioinformation, Beijing, 100101, China.
  6. Hong Luo: China National Center for Bioinformation, Beijing, 100101, China.
  7. Yongbiao Xue: China National Center for Bioinformation, Beijing, 100101, China.
  8. Zhang Zhang: China National Center for Bioinformation, Beijing, 100101, China.
  9. Yiming Bao: China National Center for Bioinformation, Beijing, 100101, China.
  10. Shuhui Song: China National Center for Bioinformation, Beijing, 100101, China. ORCID

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved many high-risk variants, resulting in repeated COVID-19 waves over the past years. Therefore, accurate early warning of high-risk variants is vital for epidemic prevention and control. However, detecting high-risk variants through experimental and epidemiological research is time-consuming and often lags behind the emergence and spread of these variants. In this study, HiRisk-Detector a machine learning algorithm based on haplotype network, is developed for computationally early detecting high-risk SARS-CoV-2 variants. Leveraging over 7.6 million high-quality and complete SARS-CoV-2 genomes and metadata, the effectiveness, robustness, and generalizability of HiRisk-Detector are validated. First, HiRisk-Detector is evaluated on actual empirical data, successfully detecting all 13 high-risk variants, preceding World Health Organization announcements by 27 days on average. Second, its robustness is tested by reducing sequencing intensity to one-fourth, noting only a minimal delay of 3.8 days, demonstrating its effectiveness. Third, HiRisk-Detector is applied to detect risks among SARS-CoV-2 Omicron variant sub-lineages, confirming its broad applicability and high ROC-AUC and PR-AUC performance. Overall, HiRisk-Detector features powerful capacity for early detection of high-risk variants, bearing great utility for any public emergency caused by infectious diseases or viruses.

Keywords

References

  1. P. Wang, M. S. Nair, L. Liu, S. Iketani, Y. Luo, Y. Guo, M. Wang, J. Yu, B. Zhang, P. D. Kwong, B. S. Graham, J. R. Mascola, J. Y. Chang, M. T. Yin, M. Sobieszczyk, C. A. Kyratsous, L. Shapiro, Z. Sheng, Y. Huang, D. D. Ho, Nature 2021, 593, 130.
  2. Eurosurveillance editorial team, Eurosurveillance 2021, 26, 2101211.
  3. M. M. DeGrace, E. Ghedin, M. B. Frieman, F. Krammer, A. Grifoni, A. Alisoltani, G. Alter, R. R. Amara, R. S. Baric, D. H. Barouch, J. D. Bloom, L.���M. Bloyet, G. Bonenfant, A. C. M. Boon, E. A. Boritz, D. L. Bratt, T. L. Bricker, L. Brown, W. J. Buchser, J. M. Carre��o, L. Cohen���Lavi, T. L. Darling, M. E. Davis���Gardner, B. L. Dearlove, H. Di, M. Dittmann, N. A. Doria���Rose, D. C. Douek, C. Drosten, V.���V. Edara, et al., Nature 2022, 605, 640.
  4. G. Nicora, M. Salemi, S. Marini, R. Bellazzi, BMJ Health Care Inform. 2022, 29, 100643.
  5. J. Li, Y.���N. Wu, S. Zhang, X.���P. Kang, T. Jiang, Brief. Bioinform. 2022, 23, bbac036.
  6. Q. Sun, C. Shu, W. Shi, Y. Luo, G. Fan, J. Nie, Y. Bi, Q. Wang, J. Qi, J. Lu, Y. Zhou, Z. Shen, Z. Meng, X. Zhang, Z. Yu, S. Gao, L. Wu, J. Ma, S. Hu, Nucleic Acids Res. 2022, 50, D888.
  7. M. C. Maher, I. Bartha, S. Weaver, J. di Iulio, E. Ferri, L. Soriaga, F. A. Lempp, B. L. Hie, B. Bryson, B. Berger, D. L. Robertson, G. Snell, D. Corti, H. W. Virgin, S. L. Kosakovsky Pond, A. Telenti, Sci. Transl. Med. 2022, 14, abk3445.
  8. F. Obermeyer, M. Jankowiak, N. Barkas, S. F. Schaffner, J. D. Pyle, L. Yurkovetskiy, M. Bosso, D. J. Park, M. Babadi, B. L. MacInnis, J. Luban, P. C. Sabeti, J. E. Lemieux, Science 2022, 376, 1327.
  9. K. Beguir, M. J. Skwark, Y. Fu, T. Pierrot, N. L. Carranza, A. Laterre, I. Kadri, A. Korched, A. U. Lowegard, B. G. Lui, B. S��nger, Y. Liu, A. Poran, A. Muik, U. Sahin, Comput. Biol. Med. 2023, 155, 106618.
  10. S. Harari, D. Miller, S. Fleishon, D. Burstein, A. Stern, Nat. Commun. 2024, 15, 648.
  11. P. Forster, L. Forster, C. Renfrew, M. Forster, Proc. Natl. Acad. Sci. USA 2020, 117, 9241.
  12. S. Azad, S. Devi, J. Travel Med. 2020, 27, taaa130.
  13. S. Song, C. Li, L. Kang, D. Tian, N. Badar, W. Ma, S. Zhao, X. Jiang, C. Wang, Y. Sun, W. Li, M. Lei, S. Li, Q. Qi, A. Ikram, M. Salman, M. Umair, H. Shireen, F. Batool, B. Zhang, H. Chen, Y.���G. Yang, A. A. Abbasi, M. Li, Y. Xue, Y. Bao, Genomics Proteomics Bioinf. 2021, 19, 727.
  14. M. Koutrouli, E. Karatzas, D. Paez���Espino, G. A. Pavlopoulos, Front. Bioeng. Biotechnol. 2020, 8, 34.
  15. C. Li, L. Ma, D. Zou, R. Zhang, X. Bai, L. Li, G. Wu, T. Huang, W. Zhao, E. Jin, Y. Bao, S. Song, Genomics Proteomics Bioinf. 2023, 21, 1066.
  16. T. Kawano���Sugaya, K. Yatsu, T. Sekizuka, K. Itokawa, M. Hashino, R. Tanaka, M. Kuroda, G3 (Bethesda) 2021, 11, jkab126.
  17. L. Van der Maaten, G. Hinton, J. Mach. Learn. Res. 2008, 9, 2579.
  18. F. Pedregosa, J. Mach. Learn. Res. 2011, 12, 2825.
  19. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T. Y. Liu, Advances in Neural Information Processing Systems, Curran Associates, Inc., Red Hook, New York, 2017.
  20. T. K. Ho, in Proceedings of 3rd international conference on document analysis and recognition, 1995.
  21. D. R. Cox, J. Royal Stat. Soc. 1958, 20, 215.
  22. G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, G. Bing, Expert Syst. Appl. 2017, 73, 220.
  23. A. R. Templeton, K. A. Crandall, C. F. Sing, Genetics 1992, 132, 619.
  24. H. J. Bandelt, P. Forster, A. R��hl, Mol. Biol. Evol. 1999, 16, 37.
  25. L. Li, B. Xu, D. Tian, A. Wang, J. Zhu, C. Li, N. Li, W. Zhao, L. Shi, Y. Xue, Z. Zhang, Y. Bao, W. Zhao, S. Song, Brief Bioinform. 2023, 24, bbad174.
  26. T. N. Kipf, M. Welling, arXiv 1609.02907, 2016.
  27. S. Song, L. Ma, D. Zou, D. Tian, C. Li, J. Zhu, M. Chen, A. Wang, Y. Ma, M. Li, X. Teng, Y. Cui, G. Duan, M. Zhang, T. Jin, C. Shi, Z. Du, Y. Zhang, C. Liu, R. Li, J. Zeng, L. Hao, S. Jiang, H. Chen, D. Han, J. Xiao, Z. Zhang, W. Zhao, Y. Xue, Y. Bao, Genomics Proteomics Bioinf. 2020, 18, 749.
  28. W. M. Zhao, S. H. Song, M. L. Chen, D. Zou, L. N. Ma, Y. K. Ma, R. J. Li, L. L. Hao, C. P. Li, D. M. Tian, B. X. Tang, Y. Q. Wang, J. W. Zhu, H. X. Chen, Z. Zhang, Y. B. Xue, Y. M. Bao, Yi Chuan 2020, 42, 212.
  29. ��. O'Toole, E. Scher, A. Underwood, B. Jackson, V. Hill, J. T. McCrone, R. Colquhoun, C. Ruis, K. Abu���Dahab, B. Taylor, C. Yeats, L. du Plessis, D. Maloney, N. Medd, S. W. Attwood, D. M. Aanensen, E. C. Holmes, O. G. Pybus, A. Rambaut, Virus Evol. 2021, 7, veab064.
  30. A. Rambaut, E. C. Holmes, ��. O'Toole, V. Hill, J. T. McCrone, C. Ruis, L. du Plessis, O. G. Pybus, Nat. Microbiol. 2021, 6, 415.
  31. A. Rambaut, E. C. Holmes, ��. O'Toole, V. Hill, J. T. McCrone, C. Ruis, L. du Plessis, O. G. Pybus, Nat. Microbiol. 2020, 5, 1403.
  32. D. H. Huson, R. Rupp, C. Scornavacca, Phylogenetic networks: concepts, algorithms and applications, Cambridge University Press, Cambridge 2010.
  33. G. Csardi, T. Nepusz, InterJournal, Complex Syst. 2006, 1695, 1.

Grants

  1. 2023YFC3041500/Ministry of Science and Technology of the People's Republic of China
  2. ANSO-CR-KP-2022-09/Alliance of National and International��Science Organizations for the Belt��and��Road Regions
  3. Z211100002121006/Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park
  4. 2021YFF0703703/National Key Research & Development Program of China
  5. 2023YFC2604400/National Key Research & Development Program of China
  6. XDB38030200/Chinese Academy of Sciences
  7. Y2021038/Chinese Academy of Sciences
  8. 32170678/National Natural Science Foundation of China
  9. 32270718/National Natural Science Foundation of China

Word Cloud

Similar Articles

Cited By