A thorough analysis of the contribution of experimental, derived and sequence-based predicted protein-protein interactions for functional annotation of proteins.

Stavros Makrodimitris, Marcel Reinders, Roeland van Ham
Author Information
  1. Stavros Makrodimitris: Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands. ORCID
  2. Marcel Reinders: Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands.
  3. Roeland van Ham: Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands.

Abstract

Physical interaction between two proteins is strong evidence that the proteins are involved in the same biological process, making Protein-Protein Interaction (PPI) networks a valuable data resource for predicting the cellular functions of proteins. However, PPI networks are largely incomplete for non-model species. Here, we tested to what extent these incomplete networks are still useful for genome-wide function prediction. We used two network-based classifiers to predict Biological Process Gene Ontology terms from protein interaction data in four species: Saccharomyces cerevisiae, Escherichia coli, Arabidopsis thaliana and Solanum lycopersicum (tomato). The classifiers had reasonable performance in the well-studied yeast, but performed poorly in the other species. We showed that this poor performance can be considerably improved by adding edges predicted from various data sources, such as text mining, and that associations from the STRING database are more useful than interactions predicted by a neural network from sequence-based features.

References

  1. BMC Bioinformatics. 2013;14 Suppl 3:S8 [PMID: 23514608]
  2. Bioinformatics. 2013 Jul 01;29(13):i53-61 [PMID: 23813009]
  3. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  4. BMC Bioinformatics. 2008 Jul 22;9 Suppl 8:S2 [PMID: 18673526]
  5. Bioinformatics. 2018 Nov 15;34(22):3873-3881 [PMID: 29868758]
  6. Cell Syst. 2016 Dec 21;3(6):540-548.e5 [PMID: 27889536]
  7. PLoS Comput Biol. 2019 Nov 4;15(11):e1007419 [PMID: 31682632]
  8. Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63 [PMID: 25378336]
  9. Nucleic Acids Res. 2012 Jan;40(Database issue):D1202-10 [PMID: 22140109]
  10. BMC Bioinformatics. 2018 May 21;19(1):176 [PMID: 29783926]
  11. Genome Biol. 2001;2(9):RESEARCH0035 [PMID: 11574054]
  12. BMC Bioinformatics. 2017 May 25;18(1):277 [PMID: 28545462]
  13. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D433-7 [PMID: 15608232]
  14. Genome Biol. 2019 Nov 19;20(1):244 [PMID: 31744546]
  15. PLoS One. 2014 Jan 21;9(1):e86387 [PMID: 24466070]
  16. Bioinformatics. 2007 Jul 1;23(13):i125-32 [PMID: 17646288]
  17. Methods. 2018 Aug 1;145:82-90 [PMID: 29883746]
  18. Nature. 2020 Apr;580(7803):402-408 [PMID: 32296183]
  19. Nucleic Acids Res. 2019 Jan 8;47(D1):D529-D541 [PMID: 30476227]
  20. Genome Biol. 2016 Sep 07;17(1):184 [PMID: 27604469]
  21. Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613 [PMID: 30476243]
  22. G3 (Bethesda). 2014 Mar 20;4(3):389-98 [PMID: 24374639]
  23. KDD. 2016 Aug;2016:855-864 [PMID: 27853626]
  24. Bioinformatics. 2019 Jul 15;35(14):i305-i314 [PMID: 31510705]
  25. Bioinformatics. 2018 Jul 15;34(14):2465-2473 [PMID: 29522145]
  26. Nucleic Acids Res. 2019 Jul 2;47(W1):W373-W378 [PMID: 31073595]
  27. Bioinformatics. 2018 Sep 1;34(17):i802-i810 [PMID: 30423091]
  28. Curr Opin Struct Biol. 2002 Jun;12(3):368-73 [PMID: 12127457]
  29. Nucleic Acids Res. 2019 Jul 2;47(W1):W379-W387 [PMID: 31106361]
  30. Science. 2003 Oct 17;302(5644):449-53 [PMID: 14564010]
  31. Nat Methods. 2013 Mar;10(3):221-7 [PMID: 23353650]

MeSH Term

Arabidopsis
Arabidopsis Proteins
Escherichia coli
Escherichia coli Proteins
Solanum lycopersicum
Molecular Sequence Annotation
Protein Interaction Maps
Saccharomyces cerevisiae
Saccharomyces cerevisiae Proteins

Chemicals

Arabidopsis Proteins
Escherichia coli Proteins
Saccharomyces cerevisiae Proteins

Word Cloud

Created with Highcharts 10.0.0proteinsnetworksdatapredictedinteractiontwoPPIincompletespeciesusefulclassifiersperformanceinteractionssequence-basedPhysicalstrongevidenceinvolvedbiologicalprocessmakingProtein-ProteinInteractionvaluableresourcepredictingcellularfunctionsHoweverlargelynon-modeltestedextentstillgenome-widefunctionpredictionusednetwork-basedpredictBiologicalProcessGeneOntologytermsproteinfourspecies:SaccharomycescerevisiaeEscherichiacoliArabidopsisthalianaSolanumlycopersicumtomatoreasonablewell-studiedyeastperformedpoorlyshowedpoorcanconsiderablyimprovedaddingedgesvarioussourcestextminingassociationsSTRINGdatabaseneuralnetworkfeaturesthoroughanalysiscontributionexperimentalderivedprotein-proteinfunctionalannotation

Similar Articles

Cited By