Lun Li, Cuiping Li, Na Li, Dong Zou, Wenming Zhao, Hong Luo, Yongbiao Xue, Zhang Zhang, Yiming Bao, Shuhui Song
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved many high-risk variants, resulting in repeated COVID-19 waves over the past years. Therefore, accurate early warning of high-risk variants is vital for epidemic prevention and control. However, detecting high-risk variants through experimental and epidemiological research is time-consuming and often lags behind the emergence and spread of these variants. In this study, HiRisk-Detector a machine learning algorithm based on haplotype network, is developed for computationally early detecting high-risk SARS-CoV-2 variants. Leveraging over 7.6 million high-quality and complete SARS-CoV-2 genomes and metadata, the effectiveness, robustness, and generalizability of HiRisk-Detector are validated. First, HiRisk-Detector is evaluated on actual empirical data, successfully detecting all 13 high-risk variants, preceding World Health Organization announcements by 27 days on average. Second, its robustness is tested by reducing sequencing intensity to one-fourth, noting only a minimal delay of 3.8 days, demonstrating its effectiveness. Third, HiRisk-Detector is applied to detect risks among SARS-CoV-2 Omicron variant sub-lineages, confirming its broad applicability and high ROC-AUC and PR-AUC performance. Overall, HiRisk-Detector features powerful capacity for early detection of high-risk variants, bearing great utility for any public emergency caused by infectious diseases or viruses.
Brief Bioinform. 2022 May 13;23(3):
[PMID:
35233612]
Yi Chuan. 2020 Feb 20;42(2):212-221
[PMID:
32102777]
Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9241-9243
[PMID:
32269081]
Nat Microbiol. 2021 Mar;6(3):415
[PMID:
33514928]
Sci Transl Med. 2022 Feb 23;14(633):eabk3445
[PMID:
35014856]
Front Bioeng Biotechnol. 2020 Jan 31;8:34
[PMID:
32083072]
Comput Biol Med. 2023 Mar;155:106618
[PMID:
36774893]
Science. 2022 Jun 17;376(6599):1327-1332
[PMID:
35608456]
Nucleic Acids Res. 2022 Jan 7;50(D1):D888-D897
[PMID:
34634813]
Nature. 2021 May;593(7857):130-135
[PMID:
33684923]
Nat Microbiol. 2020 Nov;5(11):1403-1407
[PMID:
32669681]
Mol Biol Evol. 1999 Jan;16(1):37-48
[PMID:
10331250]
Genomics Proteomics Bioinformatics. 2023 Oct;21(5):1066-1079
[PMID:
37898309]
Nat Commun. 2024 Jan 20;15(1):648
[PMID:
38245511]
J Travel Med. 2020 Dec 23;27(8):
[PMID:
32776124]
Genetics. 1992 Oct;132(2):619-33
[PMID:
1385266]
BMJ Health Care Inform. 2022 Dec;29(1):
[PMID:
36593658]
Brief Bioinform. 2023 May 19;24(3):
[PMID:
37170752]
Genomics Proteomics Bioinformatics. 2020 Dec;18(6):749-759
[PMID:
33704069]
Virus Evol. 2021 Jul 30;7(2):veab064
[PMID:
34527285]
G3 (Bethesda). 2021 Aug 7;11(8):
[PMID:
33892501]
Genomics Proteomics Bioinformatics. 2021 Oct;19(5):727-740
[PMID:
34695600]
Adv Sci (Weinh). 2024 Dec;11(45):e2405058
[PMID:
39401400]