Semi-supervised prediction of protein interaction sites from unlabeled sample information.

Ye Wang, Changqing Mei, Yuming Zhou, Yan Wang, Chunhou Zheng, Xiao Zhen, Yan Xiong, Peng Chen, Jun Zhang, Bing Wang
Author Information
  1. Ye Wang: School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China.
  2. Changqing Mei: School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China.
  3. Yuming Zhou: School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China.
  4. Yan Wang: School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China.
  5. Chunhou Zheng: Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, Anhui, China.
  6. Xiao Zhen: School of Computer Science and Technology, Anhui University of Technology, Maanshan, 243002, Anhui, China.
  7. Yan Xiong: School of Computer Science and Technology, University of Science & Technology, Hefei, 230026, Anhui, China.
  8. Peng Chen: Institute of Health Sciences, Anhui University, Hefei, 230601, Anhui, China. pchen@ahu.edu.cn.
  9. Jun Zhang: College of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China.
  10. Bing Wang: School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China. wangbing@ustc.edu.

Abstract

BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today.
RESULTS: In this work, three semi-supervised support vector machine-based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms.
CONCLUSION: The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance.

Keywords

References

  1. Sci Rep. 2018 Oct 15;8(1):15270 [PMID: 30323198]
  2. IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1115-1121 [PMID: 28113782]
  3. BMC Syst Biol. 2018 Dec 31;12(Suppl 9):132 [PMID: 30598091]
  4. IEEE Trans Nanobioscience. 2014 Jun;13(2):118-23 [PMID: 24771594]
  5. J Proteome Res. 2010 Oct 1;9(10):4992-5001 [PMID: 20698572]
  6. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W450-6 [PMID: 20435678]
  7. Protein Pept Lett. 2010 Sep;17(9):1111-6 [PMID: 20509853]
  8. IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):901-912 [PMID: 26661785]
  9. PLoS One. 2013;8(4):e58368 [PMID: 23560036]
  10. Proteins. 2005 Nov 1;61(2):344-55 [PMID: 16104020]
  11. Oncotarget. 2017 Oct 6;8(51):89021-89032 [PMID: 29179495]
  12. IEEE/ACM Trans Comput Biol Bioinform. 2017 Mar-Apr;14(2):345-352 [PMID: 28368812]
  13. Comput Biol Med. 2012 Apr;42(4):402-7 [PMID: 22226645]
  14. BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):467 [PMID: 28155630]
  15. Protein Pept Lett. 2008;15(5):478-83 [PMID: 18537737]
  16. Proteins. 2009;77 Suppl 9:152-6 [PMID: 19768678]
  17. J Theor Biol. 2011 Aug 21;283(1):44-52 [PMID: 21635901]
  18. Amino Acids. 2017 Oct;49(10):1773-1785 [PMID: 28766075]
  19. Biochem Biophys Res Commun. 2009 Mar 6;380(2):318-22 [PMID: 19171120]
  20. FEBS Lett. 2006 Jan 23;580(2):380-4 [PMID: 16376878]
  21. Protein Pept Lett. 2010 Sep;17(9):1069-78 [PMID: 20509849]
  22. BMC Bioinformatics. 2018 Nov 27;19(1):455 [PMID: 30482172]
  23. BMC Syst Biol. 2017 Dec 21;11(Suppl 7):127 [PMID: 29322918]
  24. Eur J Biochem. 2002 Mar;269(5):1356-61 [PMID: 11874449]
  25. Nature. 2005 Oct 20;437(7062):1173-8 [PMID: 16189514]
  26. Comput Biol Chem. 2019 Feb;78:353-358 [PMID: 30665056]
  27. Bioinformatics. 2003 Jan;19(1):163-4 [PMID: 12499312]
  28. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  29. Int J Mol Sci. 2017 Jul 18;18(7):null [PMID: 28718782]

MeSH Term

Algorithms
Amino Acid Sequence
Amino Acids
Biochemical Phenomena
Conserved Sequence
Entropy
Proteins
Support Vector Machine

Chemicals

Amino Acids
Proteins

Word Cloud

Created with Highcharts 10.0.0proteinsitesinteractionpredictionperformanceinformationunlabeledsemi-supervisedsequencepredictorsimprovethreesupportvectorcanresidueinterfacesmallaccuracyeffectivelyusingamountsdatainvolvedieentropyresiduesmachineSemi-supervisedBACKGROUND:recognitiongreatsignificancemanybiologicalprocessessignalingpathwaysdrugdesignsHoweversequencesdefinednon-interfacepartinteractionsidentifiedwillcauselackgeneralizationabilityThereforenecessarylargetogetherlabeledbackgroundknowledgetodayRESULTS:workmachine-basedmethodsproposedHereinfivefeaturesrelatedevolutionaryconservationaminoacidsextractedHSSPdatabaseConsurfSeverspatialspectrumrelativeconservedweightresidualBaseevolutionraterepresentwithinbuiltidentifyingsurfacetypesalgorithmsCONCLUSION:experimentalresultsdemonstratedapproachesoneachievebest707%sensitivity6267%specificity7872%respectivelycomparisonexistingstudiesmodelsshowimprovementpredicationsampleConservativefeatureProteinsiteUnlabeled

Similar Articles

Cited By