Emvirus: An embedding-based neural framework for human-virus protein-protein interactions prediction.

Pengfei Xie, Jujuan Zhuang, Geng Tian, Jialiang Yang
Author Information
  1. Pengfei Xie: College of Transportation Engineering, Dalian Maritime University, Dalian 116026, China.
  2. Jujuan Zhuang: School of Science, Dalian Maritime University, Dalian 116026, China.
  3. Geng Tian: Geneis Beijing Co., Ltd., Beijing 100102, China.
  4. Jialiang Yang: Geneis Beijing Co., Ltd., Beijing 100102, China.

Abstract

Human-virus protein-protein interactions (PPIs) play critical roles in viral infection. For example, the spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) binds primarily to human angiotensin-converting enzyme 2 (ACE2) protein to infect human cells. Thus, identifying and blocking these PPIs contribute to controlling and preventing viruses. However, wet-lab experiment-based identification of human-virus PPIs is usually expensive, labor-intensive, and time-consuming, which presents the need for computational methods. Many machine-learning methods have been proposed recently and achieved good results in predicting human-virus PPIs. However, most methods are based on protein sequence features and apply manually extracted features, such as statistical characteristics, phylogenetic profiles, and physicochemical properties. In this work, we present an embedding-based neural framework with convolutional neural network (CNN) and bi-directional long short-term memory unit (Bi-LSTM) architecture, named Emvirus, to predict human-virus PPIs (including human-SARS-CoV-2 PPIs). In addition, we conduct cross-viral experiments to explore the generalization ability of Emvirus. Compared to other feature extraction methods, Emvirus achieves better prediction accuracy.

Keywords

References

  1. Comput Struct Biotechnol J. 2019 Dec 26;18:153-161 [PMID: 31969974]
  2. Bioinformatics. 2016 Apr 15;32(8):1144-50 [PMID: 26677965]
  3. Bioinformatics. 2013 May 15;29(10):1357-8 [PMID: 23515528]
  4. Med. 2021 Jan 15;2(1):99-112.e7 [PMID: 32838362]
  5. PLoS Pathog. 2008 Feb 8;4(2):e32 [PMID: 18282095]
  6. Bioinformatics. 2017 Oct 15;33(20):3195-3201 [PMID: 28637337]
  7. Nature. 2002 Jan 10;415(6868):141-7 [PMID: 11805826]
  8. Nat Genet. 2001 Dec;29(4):482-6 [PMID: 11694880]
  9. Bioinformatics. 2019 Jul 15;35(14):i305-i314 [PMID: 31510705]
  10. BMC Bioinformatics. 2013;14 Suppl 8:S10 [PMID: 23815620]
  11. Bioinformatics. 2018 Sep 1;34(17):i802-i810 [PMID: 30423091]
  12. Nature. 2020 Jul;583(7816):459-468 [PMID: 32353859]
  13. Bioinformatics. 2008 Sep 15;24(18):1980-6 [PMID: 18676973]
  14. Database (Oxford). 2016 Jul 03;2016: [PMID: 27374121]
  15. Nucleic Acids Res. 2015 Jan;43(Database issue):D588-92 [PMID: 25217587]
  16. BMC Bioinformatics. 2017 May 25;18(1):277 [PMID: 28545462]
  17. Nature. 2015 May 28;521(7553):436-44 [PMID: 26017442]
  18. Genome Res. 2004 Jun;14(6):1107-18 [PMID: 15173116]
  19. Mol Biosyst. 2014 Dec;10(12):3147-54 [PMID: 25230581]
  20. Int J Mol Sci. 2021 May 24;22(11): [PMID: 34073774]
  21. Nucleic Acids Res. 2015 Jan;43(Database issue):D583-7 [PMID: 25392406]
  22. Comput Struct Biotechnol J. 2021 Dec 23;20:333-342 [PMID: 35035786]
  23. Bioinformatics. 2021 Dec 11;37(24):4771-4778 [PMID: 34273146]
  24. Nature. 2002 Jan 10;415(6868):180-3 [PMID: 11805837]
  25. Proteins. 2010 Nov 15;78(15):3235-41 [PMID: 20715056]
  26. Nature. 1989 Jul 20;340(6230):245-6 [PMID: 2547163]
  27. Protein Pept Lett. 2010 Sep;17(9):1085-90 [PMID: 20509850]
  28. Bioinformatics. 2007 Jul 1;23(13):i159-66 [PMID: 17646292]
  29. Radiology. 1982 Apr;143(1):29-36 [PMID: 7063747]
  30. Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4337-41 [PMID: 17360525]
  31. IEEE Trans Neural Netw. 1994;5(2):157-66 [PMID: 18267787]
  32. Nature. 2012 Oct 25;490(7421):556-60 [PMID: 23023127]
  33. Innovation (Camb). 2021 May 28;2(2):100116 [PMID: 33997827]
  34. Bioinformatics. 2009 Jan 1;25(1):30-5 [PMID: 19008251]
  35. Nucleic Acids Res. 2008 May;36(9):3025-30 [PMID: 18390576]
  36. BMC Bioinformatics. 2017 Mar 2;18(1):145 [PMID: 28253857]
  37. Front Cell Dev Biol. 2020 Sep 30;8:572195 [PMID: 33102477]

Word Cloud

Created with Highcharts 10.0.0PPIshuman-virusmethodsproteinneuralEmvirusprotein-proteininteractions2SARS-CoV-2humanHoweverfeaturesembedding-basedframeworkpredictionHuman-virusplaycriticalrolesviralinfectionexamplespikesevereacuterespiratorysyndromecoronavirusbindsprimarilyangiotensin-convertingenzymeACE2infectcellsThusidentifyingblockingcontributecontrollingpreventingviruseswet-labexperiment-basedidentificationusuallyexpensivelabor-intensivetime-consumingpresentsneedcomputationalManymachine-learningproposedrecentlyachievedgoodresultspredictingbasedsequenceapplymanuallyextractedstatisticalcharacteristicsphylogeneticprofilesphysicochemicalpropertiesworkpresentconvolutionalnetworkCNNbi-directionallongshort-termmemoryunitBi-LSTMarchitecturenamedpredictincludinghuman-SARS-CoV-2additionconductcross-viralexperimentsexploregeneralizationabilityComparedfeatureextractionachievesbetteraccuracyEmvirus:Doc2vecNeuralnetworksWordembeddingPPI

Similar Articles

Cited By (4)