VISTA: A Tool for Fast Taxonomic Assignment of Viral Genome Sequences.

Tao Zhang 张韬, Yiyun Liu 刘依云, Xutong Guo 郭栩彤, Xinran Zhang 张欣然, Xinchang Zheng 郑欣畅, Mochen Zhang 张陌尘, Yiming Bao 鲍一明
Author Information
  1. Tao Zhang 张韬: National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China. ORCID
  2. Yiyun Liu 刘依云: National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China. ORCID
  3. Xutong Guo 郭栩彤: National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China. ORCID
  4. Xinran Zhang 张欣然: National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China. ORCID
  5. Xinchang Zheng 郑欣畅: National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China. ORCID
  6. Mochen Zhang 张陌尘: National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China. ORCID
  7. Yiming Bao 鲍一明: National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China. ORCID

Abstract

The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce Virus Sequence-based Taxonomy Assignment (VISTA), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to Pairwise Sequence Comparison (PASC), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA's scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.

Keywords

References

PLoS One. 2012;7(7):e39845 [PMID: 22848363]
Arch Virol. 2015 Feb;160(2):621-32 [PMID: 25449305]
Nature. 2016 Sep 29;537(7622):689-693 [PMID: 27654921]
Microbiome. 2017 May 31;5(1):57 [PMID: 28569210]
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402 [PMID: 9254694]
Annu Rev Virol. 2019 Sep 29;6(1):119-139 [PMID: 31100994]
Nature. 2005 Sep 15;437(7057):356-61 [PMID: 16163346]
Viruses. 2019 Jun 28;11(7): [PMID: 31261652]
Arch Virol. 2013 Jun;158(6):1411-24 [PMID: 23340592]
Arch Virol. 2016 Mar;161(3):755-68 [PMID: 26608064]
Viruses. 2016 Jun 10;8(6): [PMID: 27294949]
Viruses. 2017 May 11;9(5): [PMID: 28492506]
Genome Biol. 2017 Oct 3;18(1):186 [PMID: 28974235]
Viruses. 2012 Aug;4(8):1318-27 [PMID: 23012628]
Trends Plant Sci. 2003 Aug;8(8):374-9 [PMID: 12927970]
PLoS Biol. 2023 Feb 13;21(2):e3001922 [PMID: 36780432]
Bioinformatics. 2014 May 1;30(9):1312-3 [PMID: 24451623]
Trends Genet. 2000 Jun;16(6):276-7 [PMID: 10827456]
J Virol. 2012 Apr;86(7):3890-904 [PMID: 22278230]
Bioinformatics. 2001 Nov;17(11):1035-46 [PMID: 11724732]
J Gen Virol. 2015 Jun;96(Pt 6):1193-1206 [PMID: 26068186]
Viruses. 2021 Mar 18;13(3): [PMID: 33803862]
Arch Virol. 2015 Jun;160(6):1593-619 [PMID: 25894478]
Arch Virol. 2023 Jan 23;168(2):74 [PMID: 36683075]
PLoS One. 2014 Sep 26;9(9):e108277 [PMID: 25259891]
Arch Virol. 2005 Mar;150(3):459-79 [PMID: 15592889]
Arch Virol. 2015 Jul;160(7):1851-74 [PMID: 25935216]
Front Microbiol. 2022 Dec 16;13:1032186 [PMID: 36590402]
Viruses. 2020 Nov 06;12(11): [PMID: 33172115]
Nature. 2013 May 16;497(7449):327-31 [PMID: 23657258]
Microbiome. 2018 Feb 20;6(1):38 [PMID: 29458427]
PLoS Comput Biol. 2020 May 26;16(5):e1007894 [PMID: 32453718]
Nature. 2016 Aug 25;536(7617):425-30 [PMID: 27533034]
Cell. 2019 May 16;177(5):1109-1123.e14 [PMID: 31031001]
Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4337-41 [PMID: 17360525]
Arch Virol. 2008;153(7):1263-70 [PMID: 18509590]
Nat Struct Biol. 1996 Oct;3(10):842-8 [PMID: 8836100]
Nat Microbiol. 2020 Apr;5(4):536-544 [PMID: 32123347]
J Gen Virol. 2019 Aug;100(8):1204-1205 [PMID: 31184570]
Arch Virol. 2016 Oct;161(10):2893-8 [PMID: 27456427]
Bioinformatics. 2017 Nov 01;33(21):3396-3404 [PMID: 29036289]
Mol Biol Evol. 2015 Jan;32(1):268-74 [PMID: 25371430]
Brief Bioinform. 2020 May 21;21(3):1006-1015 [PMID: 30895303]
Arch Virol. 2014 Dec;159(12):3293-304 [PMID: 25119676]
Microbiome. 2023 Jan 26;11(1):15 [PMID: 36698172]
Nat Rev Microbiol. 2007 Oct;5(10):801-12 [PMID: 17853907]
Nucleic Acids Res. 2015 Jan;43(Database issue):D571-7 [PMID: 25428358]
Curr Opin Virol. 2021 Dec;51:207-215 [PMID: 34781105]
Nat Rev Microbiol. 2017 Mar;15(3):161-168 [PMID: 28134265]
J Gen Virol. 2017 Mar;98(3):355-356 [PMID: 28366189]
Virus Evol. 2021 Jan 25;7(1):veab001 [PMID: 33623708]
Nat Biotechnol. 2019 Jun;37(6):632-639 [PMID: 31061483]

MeSH Term

Genome, Viral
Software
Viruses
Phylogeny
Computational Biology

Word Cloud

Similar Articles

Cited By