TEMPO: A transformer-based mutation prediction framework for SARS-CoV-2 evolution.
Binbin Zhou, Hang Zhou, Xue Zhang, Xiaobin Xu, Yi Chai, Zengwei Zheng, Alex Chichung Kot, Zhan Zhou
Author Information
Binbin Zhou: Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; Industry Brain Institute, Zhejiang University City College, Hangzhou, 310015, China. Electronic address: bbzhou@zucc.edu.cn.
Hang Zhou: Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China. Electronic address: hangz@zju.edu.cn.
Xue Zhang: Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China. Electronic address: 22119130@zju.edu.cn.
Xiaobin Xu: Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China. Electronic address: 3190104201@zju.edu.cn.
Yi Chai: ZJU-UoE Institute, Zhejiang University, Haining, 314400, China. Electronic address: ychai@u.nus.edu.
Zengwei Zheng: Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; Industry Brain Institute, Zhejiang University City College, Hangzhou, 310015, China. Electronic address: zhengzw@zucc.edu.cn.
Alex Chichung Kot: School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore. Electronic address: eackot@ntu.edu.sg.
Zhan Zhou: Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China; Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 310058, China. Electronic address: zhanzhou@zju.edu.cn.
The widespread of SARS-CoV-2 presents a significant threat to human society, as well as public health and economic development. Extensive efforts have been undertaken to battle against the pandemic, whereas effective approaches such as vaccination would be weakened by the continuous mutations, leading to considerable attention being attracted to the mutation prediction. However, most previous studies lack attention to phylogenetics. In this paper, we propose a novel and effective model TEMPO for predicting the mutation of SARS-CoV-2 evolution. Specifically, we design a phylogenetic tree-based sampling method to generate sequence evolution data. Then, a transformer-based model is presented for the site mutation prediction after learning the high-level representation of these sequence data. We conduct experiments to verify the effectiveness of TEMPO, leveraging a large-scale SARS-CoV- 2 dataset. Experimental results show that TEMPO is effective for mutation prediction of SARS- CoV-2 evolution and outperforms several state-of-the-art baseline methods. We further perform mutation prediction experiments of other infectious viruses, to explore the feasibility and robustness of TEMPO, and experimental results verify its superiority. The codes and datasets are freely available at https://github.com/ZJUDataIntelligence/TEMPO.