Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics.

Congyu Lu, Zheng Zhang, Zena Cai, Zhaozhong Zhu, Ye Qiu, Aiping Wu, Taijiao Jiang, Heping Zheng, Yousong Peng
Author Information
  1. Congyu Lu: Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.
  2. Zheng Zhang: Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.
  3. Zena Cai: Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.
  4. Zhaozhong Zhu: Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.
  5. Ye Qiu: Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.
  6. Aiping Wu: Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China.
  7. Taijiao Jiang: Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China.
  8. Heping Zheng: Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.
  9. Yousong Peng: Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China. pys2013@hnu.edu.cn. ORCID

Abstract

BACKGROUND: Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods.
RESULTS: We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28-34%, genus level). PHP also outperformed these two alignment-free methods much (24-38% vs 18-20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP.
CONCLUSIONS: The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.

Keywords

References

  1. BMC Bioinformatics. 2007 Jun 18;8:209 [PMID: 17577412]
  2. FEMS Microbiol Rev. 2016 Mar;40(2):258-72 [PMID: 26657537]
  3. Nat Rev Microbiol. 2005 Jun;3(6):504-10 [PMID: 15886693]
  4. Trends Microbiol. 2019 Jan;27(1):51-63 [PMID: 30181062]
  5. Viruses. 2013 Mar 11;5(3):806-23 [PMID: 23478639]
  6. Future Microbiol. 2010 Feb;5(2):177-89 [PMID: 20143943]
  7. Viruses. 2017 Mar 18;9(3): [PMID: 28335451]
  8. Viruses. 2016 May 04;8(5): [PMID: 27153081]
  9. Nucleic Acids Res. 2019 Jan 8;47(D1):D23-D28 [PMID: 30395293]
  10. Cell. 2019 May 16;177(5):1109-1123.e14 [PMID: 31031001]
  11. Nat Rev Microbiol. 2007 Oct;5(10):801-12 [PMID: 17853907]
  12. Database (Oxford). 2020 Jan 1;2020: [PMID: 32761142]
  13. Microbiome. 2018 Feb 01;6(1):24 [PMID: 29391057]
  14. Nat Microbiol. 2019 Dec;4(12):2192-2203 [PMID: 31384000]
  15. Trends Microbiol. 2016 Apr;24(4):249-256 [PMID: 26786863]
  16. Arch Virol. 2020 Nov;165(11):2737-2748 [PMID: 32816125]
  17. Nature. 2016 Aug 25;536(7617):425-30 [PMID: 27533034]
  18. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402 [PMID: 9254694]
  19. Proc Natl Acad Sci U S A. 2016 Sep 13;113(37):10400-5 [PMID: 27573828]
  20. Nature. 2016 Dec 22;540(7634):539-543 [PMID: 27880757]
  21. Appl Environ Microbiol. 2005 Jun;71(6):3119-25 [PMID: 15933010]
  22. Bioinformatics. 2017 Oct 1;33(19):3113-3114 [PMID: 28957499]
  23. Nucleic Acids Res. 2017 Jan 9;45(1):39-53 [PMID: 27899557]
  24. Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]
  25. Front Microbiol. 2019 Oct 18;10:2403 [PMID: 31749771]
  26. Proc Natl Acad Sci U S A. 1999 Mar 2;96(5):2192-7 [PMID: 10051617]

Grants

  1. 2016YFD0500300/National Key Plan for Scientific Research and Development of China
  2. 2018JJ3039/Hunan Provincial Natural Science Foundation of China
  3. 2019JJ50035/Hunan Provincial Natural Science Foundation of China
  4. 2020JJ3006/Hunan Provincial Natural Science Foundation of China
  5. 31671371/National Natural Science Foundation of China
  6. 81902070/National Natural Science Foundation of China
  7. 2016-I2M-1-005/Chinese Academy of Medical Sciences

MeSH Term

Archaeal Viruses
Bacteriophages
Host-Pathogen Interactions
Metagenomics
Models, Biological
Normal Distribution
Software

Word Cloud

Created with Highcharts 10.0.0viruseshostvirusprokaryotichostspredictionaccuracyGaussianmodelviralmethodsProkaryoticPHPgenuslevelpredictHostbiologicalnewlyPredictorsoftwaretoolusingVirHostMatcherdatasettwoalignment-freepredictionsBACKGROUND:VirusesubiquitousentitiesestimatedlargestreservoirsunexploredgeneticdiversityEarthFullfunctionalcharacterizationannotationdiscoveredrequirestoolsenabletaxonomicassignmentrangepropertiesfocusincludephagesarchaealidentifyingessentialstepcharacterizingreliessurvivalCurrentlymethoddeterminingeitherculturelow-throughputtime-consumingexpensivecomputationallyneedsimprovementsusabilitydevelopbetterperformancespreviouscomputationalRESULTS:presentdifferencesk-merfrequenciesgenomicsequencesfeaturesgave34%benchmark35%newcontaining67160105genomesexceededWIsH28-34%alsooutperformedmuch24-38%vs18-20%predictingpredictedBLAST-basedCRISPR-spacer-basedaloneRequiringminimalscoremakingthresholdingtakingconsensustop30improvedCONCLUSIONS:providesintuitiveuser-friendlyAPIdescribedhereinworkwillfacilitaterapididentificationidentifiedmetagenomicstudiespredictor:metagenomicsBioinformaticsMetagenomicsVirome

Similar Articles

Cited By (48)