TEMPO: A transformer-based mutation prediction framework for SARS-CoV-2 evolution.

Binbin Zhou, Hang Zhou, Xue Zhang, Xiaobin Xu, Yi Chai, Zengwei Zheng, Alex Chichung Kot, Zhan Zhou
Author Information
  1. Binbin Zhou: Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; Industry Brain Institute, Zhejiang University City College, Hangzhou, 310015, China. Electronic address: bbzhou@zucc.edu.cn.
  2. Hang Zhou: Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China. Electronic address: hangz@zju.edu.cn.
  3. Xue Zhang: Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China. Electronic address: 22119130@zju.edu.cn.
  4. Xiaobin Xu: Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China. Electronic address: 3190104201@zju.edu.cn.
  5. Yi Chai: ZJU-UoE Institute, Zhejiang University, Haining, 314400, China. Electronic address: ychai@u.nus.edu.
  6. Zengwei Zheng: Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; Industry Brain Institute, Zhejiang University City College, Hangzhou, 310015, China. Electronic address: zhengzw@zucc.edu.cn.
  7. Alex Chichung Kot: School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore. Electronic address: eackot@ntu.edu.sg.
  8. Zhan Zhou: Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China; Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 310058, China. Electronic address: zhanzhou@zju.edu.cn.

Abstract

The widespread of SARS-CoV-2 presents a significant threat to human society, as well as public health and economic development. Extensive efforts have been undertaken to battle against the pandemic, whereas effective approaches such as vaccination would be weakened by the continuous mutations, leading to considerable attention being attracted to the mutation prediction. However, most previous studies lack attention to phylogenetics. In this paper, we propose a novel and effective model TEMPO for predicting the mutation of SARS-CoV-2 evolution. Specifically, we design a phylogenetic tree-based sampling method to generate sequence evolution data. Then, a transformer-based model is presented for the site mutation prediction after learning the high-level representation of these sequence data. We conduct experiments to verify the effectiveness of TEMPO, leveraging a large-scale SARS-CoV- 2 dataset. Experimental results show that TEMPO is effective for mutation prediction of SARS- CoV-2 evolution and outperforms several state-of-the-art baseline methods. We further perform mutation prediction experiments of other infectious viruses, to explore the feasibility and robustness of TEMPO, and experimental results verify its superiority. The codes and datasets are freely available at https://github.com/ZJUDataIntelligence/TEMPO.

Keywords

References

  1. Science. 2021 Jan 15;371(6526):284-288 [PMID: 33446556]
  2. Comput Biol Med. 2021 Nov;138:104915 [PMID: 34655896]
  3. Lancet. 2021 Jan 9;397(10269):99-111 [PMID: 33306989]
  4. Nature. 2021 Apr;592(7854):438-443 [PMID: 33690265]
  5. Nat Biotechnol. 2021 Mar;39(3):274-275 [PMID: 33603204]
  6. PLoS One. 2015 Nov 10;10(11):e0141287 [PMID: 26555596]
  7. Cell Host Microbe. 2021 Jan 13;29(1):44-57.e9 [PMID: 33259788]
  8. J Phys Chem B. 2020 Jul 16;124(28):5907-5912 [PMID: 32551652]
  9. Sci Transl Med. 2022 Feb 23;14(633):eabk3445 [PMID: 35014856]
  10. Science. 2020 Jun 19;368(6497):1331-1335 [PMID: 32321856]
  11. J Am Chem Soc. 2021 Oct 27;143(42):17646-17654 [PMID: 34648291]
  12. J Virol. 2008 Jan;82(2):596-601 [PMID: 17942553]
  13. Mol Biol Evol. 2013 Apr;30(4):772-80 [PMID: 23329690]
  14. Nature. 2020 Oct;586(7830):567-571 [PMID: 32756549]
  15. Bioinformatics. 2020 May 1;36(9):2697-2704 [PMID: 31999330]
  16. Neural Comput. 1997 Nov 15;9(8):1735-80 [PMID: 9377276]
  17. Genomics Proteomics Bioinformatics. 2020 Dec;18(6):749-759 [PMID: 33704069]
  18. Euro Surveill. 2017 Mar 30;22(13): [PMID: 28382917]
  19. Nature. 2021 Mar;591(7851):639-644 [PMID: 33461210]
  20. Emerg Infect Dis. 2021 May;27(5):1522-1524 [PMID: 33605869]
  21. Cell. 2021 Mar 4;184(5):1171-1187.e20 [PMID: 33621484]
  22. Comput Biol Med. 2022 Aug;147:105708 [PMID: 35714506]
  23. Cell. 2020 Sep 3;182(5):1295-1310.e20 [PMID: 32841599]
  24. Wellcome Open Res. 2021 May 19;6:121 [PMID: 34095513]
  25. Comput Biol Med. 2022 Feb;141:105170 [PMID: 34968862]
  26. Comput Biol Med. 2022 Jun;145:105509 [PMID: 35421792]
  27. JAMA. 2021 Feb 9;325(6):529-531 [PMID: 33404586]
  28. N Engl J Med. 2021 Jan 7;384(1):80-82 [PMID: 33270381]
  29. Cell. 2021 Apr 29;184(9):2332-2347.e16 [PMID: 33761326]
  30. Bioinformatics. 2009 Sep 15;25(18):2309-17 [PMID: 19706746]
  31. Immunity. 2020 Apr 14;52(4):583-589 [PMID: 32259480]

MeSH Term

Humans
SARS-CoV-2
COVID-19
Phylogeny
Mutation
Pandemics

Word Cloud

Created with Highcharts 10.0.0mutationpredictionevolutionSARS-CoV-2TEMPOeffectiveattentionmodelmethodsequencedatatransformer-basedexperimentsverifyresultswidespreadpresentssignificantthreathumansocietywellpublichealtheconomicdevelopmentExtensiveeffortsundertakenbattlepandemicwhereasapproachesvaccinationweakenedcontinuousmutationsleadingconsiderableattractedHoweverpreviousstudieslackphylogeneticspaperproposenovelpredictingSpecificallydesignphylogenetictree-basedsamplinggeneratepresentedsitelearninghigh-levelrepresentationconducteffectivenessleveraginglarge-scaleSARS-CoV-2datasetExperimentalshowSARS-CoV-2outperformsseveralstate-of-the-artbaselinemethodsperforminfectiousvirusesexplorefeasibilityrobustnessexperimentalsuperioritycodesdatasetsfreelyavailablehttps://githubcom/ZJUDataIntelligence/TEMPOTEMPO:frameworkMutationNaturallanguageprocessingPhylogenetictreeTransformer-basedViral

Similar Articles

Cited By (10)