Prokaryotic virus host prediction with graph contrastive augmentaion.

Zhi-Hua Du, Jun-Peng Zhong, Yun Liu, Jian-Qiang Li
Author Information
  1. Zhi-Hua Du: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China. ORCID
  2. Jun-Peng Zhong: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China.
  3. Yun Liu: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China.
  4. Jian-Qiang Li: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China.

Abstract

Prokaryotic viruses, also known as bacteriophages, play crucial roles in regulating microbial communities and have the potential for phage therapy applications. Accurate prediction of phage-host interactions is essential for understanding the dynamics of these viruses and their impacts on bacterial populations. Numerous computational methods have been developed to tackle this challenging task. However, most existing prediction models can be constrained due to the substantial number of unknown interactions in comparison to the constrained diversity of available training data. To solve the problem, we introduce a model for prokaryotic virus host prediction with graph contrastive augmentation (PHPGCA). Specifically, we construct a comprehensive heterogeneous graph by integrating virus-virus protein similarity and virus-host DNA sequence similarity information. As the backbone encoder for learning node representations in the virus-prokaryote graph, we employ LGCN, a state-of-the-art graph embedding technique. Additionally, we apply graph contrastive learning to augment the node representations without the need for additional labels. We further conducted two case studies aimed at predicting the host range of multi-species phages, helping to understand the phage ecology and evolution.

References

  1. PeerJ. 2015 May 28;3:e985 [PMID: 26038737]
  2. Bioinformatics. 2020 Aug 15;36(14):4126-4129 [PMID: 32413137]
  3. NAR Genom Bioinform. 2020 Jun;2(2):lqaa044 [PMID: 32626849]
  4. Cell Host Microbe. 2023 Apr 12;31(4):665-677.e7 [PMID: 37054680]
  5. Viruses. 2017 Mar 18;9(3): [PMID: 28335451]
  6. Nature. 2019 Nov;575(7783):505-511 [PMID: 31723265]
  7. Cell. 2019 May 16;177(5):1109-1123.e14 [PMID: 31031001]
  8. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34553750]
  9. Genome Biol. 2019 Dec 4;20(1):265 [PMID: 31801633]
  10. Nat Rev Microbiol. 2007 Oct;5(10):801-12 [PMID: 17853907]
  11. Appl Environ Microbiol. 2005 Jun;71(6):3119-25 [PMID: 15933010]
  12. Nat Biomed Eng. 2019 Sep;3(9):717-728 [PMID: 31332342]
  13. Cell Host Microbe. 2019 Feb 13;25(2):195-209 [PMID: 30763534]
  14. J Virol. 2012 Oct;86(19):10384-98 [PMID: 22787233]
  15. FEMS Microbiol Rev. 2016 Mar;40(2):258-72 [PMID: 26657537]
  16. BMC Biol. 2021 Jan 14;19(1):5 [PMID: 33441133]
  17. Elife. 2021 Feb 26;10: [PMID: 33634788]
  18. PLoS One. 2016 Dec 30;11(12):e0168985 [PMID: 28036349]
  19. Sci Adv. 2020 May 15;6(20):eaba1590 [PMID: 32440552]
  20. Nat Rev Microbiol. 2005 Jun;3(6):504-10 [PMID: 15886693]
  21. Bioinformatics. 2022 Feb 7;38(5):1447-1449 [PMID: 34904625]
  22. Trends Microbiol. 2019 Jan;27(1):51-63 [PMID: 30181062]
  23. Int J Food Microbiol. 2022 Jul 2;372:109680 [PMID: 35512432]
  24. Viruses. 2013 Mar 11;5(3):806-23 [PMID: 23478639]
  25. J Bacteriol. 2003 Sep;185(17):5320-3 [PMID: 12923110]
  26. Front Cell Infect Microbiol. 2021 Jun 04;11:643214 [PMID: 34150671]
  27. Food Microbiol. 2018 Oct;75:65-71 [PMID: 30056965]
  28. Trends Microbiol. 2016 Apr;24(4):249-256 [PMID: 26786863]
  29. Nucleic Acids Res. 2020 Dec 2;48(21):e121 [PMID: 33045744]
  30. Viruses. 2016 May 04;8(5): [PMID: 27153081]
  31. BMC Biol. 2021 Nov 24;19(1):250 [PMID: 34819064]
  32. J Mol Biol. 2004 Jan 16;335(3):667-78 [PMID: 14687564]
  33. Nucleic Acids Res. 2017 Jan 9;45(1):39-53 [PMID: 27899557]
  34. Food Res Int. 2023 Mar;165:112454 [PMID: 36869473]
  35. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W5-9 [PMID: 18440982]
  36. Brief Bioinform. 2022 Sep 20;23(5): [PMID: 35595715]

MeSH Term

Prokaryotic Cells
Bacteriophages
Ecology
Host Specificity
Learning

Word Cloud

Created with Highcharts 10.0.0graphpredictionhostcontrastiveProkaryoticvirusesphageinteractionsconstrainedvirussimilaritylearningnoderepresentationsalsoknownbacteriophagesplaycrucialrolesregulatingmicrobialcommunitiespotentialtherapyapplicationsAccuratephage-hostessentialunderstandingdynamicsimpactsbacterialpopulationsNumerouscomputationalmethodsdevelopedtacklechallengingtaskHoweverexistingmodelscanduesubstantialnumberunknowncomparisondiversityavailabletrainingdatasolveproblemintroducemodelprokaryoticaugmentationPHPGCASpecificallyconstructcomprehensiveheterogeneousintegratingvirus-virusproteinvirus-hostDNAsequenceinformationbackboneencodervirus-prokaryoteemployLGCNstate-of-the-artembeddingtechniqueAdditionallyapplyaugmentwithoutneedadditionallabelsconductedtwocasestudiesaimedpredictingrangemulti-speciesphageshelpingunderstandecologyevolutionaugmentaion

Similar Articles

Cited By

No available data.