SEGT-GO: a graph transformer method based on PPI serialization and explanatory artificial intelligence for protein function prediction.

Yansong Wang, Yundong Sun, Baohui Lin, Haotian Zhang, Xiaoling Luo, Yumeng Liu, Xiaopeng Jin, Dongjie Zhu
Author Information
  1. Yansong Wang: School of Computer Science and Technology, Harbin Institute of Technology Weihai Campus, Weihai, 264209, China.
  2. Yundong Sun: School of Computer Science and Technology, Harbin Institute of Technology Weihai Campus, Weihai, 264209, China.
  3. Baohui Lin: College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
  4. Haotian Zhang: School of Computer Science and Technology, Harbin Institute of Technology Weihai Campus, Weihai, 264209, China.
  5. Xiaoling Luo: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China.
  6. Yumeng Liu: College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
  7. Xiaopeng Jin: College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China. jinxiaopeng.it@gmail.com.
  8. Dongjie Zhu: School of Computer Science and Technology, Harbin Institute of Technology Weihai Campus, Weihai, 264209, China. zhudongjie@hit.edu.cn.

Abstract

BACKGROUND: A massive amount of protein sequences have been obtained, but their functions remain challenging to discern. In recent research on protein function prediction, Protein-Protein Interaction (PPI) Networks have played a crucial role. Uncovering potential function relationships between distant proteins within PPI networks is essential for improving the accuracy of protein function prediction. Most current studies attempt to capture these distant relationships by stacking graph network layers, but performance gains diminish as the number of layers increases.
RESULTS: To further explore the potential functional relationships between multi-hop proteins in PPI networks, this paper proposes SEGT-GO, a Graph Transformer method based on PPI multi-hop neighborhood Serialization and Explainable artificial intelligence for large-scale multispecies protein function prediction. The multi-hop neighborhood serialization maps multi-hop information in the PPI Network into serialized feature embeddings, enabling the Graph Transformer to learn deeper functional features within the PPI Network. Based on game theory, the SHAP eXplainable Artificial Intelligence (XAI) framework optimizes model input and filters out feature noise, enhancing model performance.
CONCLUSIONS: Compared to the advanced network method DeepGraphGO, SEGT-GO achieves more competitive results in standard large-scale datasets and superior results on small ones, validating its ability to extract functional information from deep proteins. Furthermore, SEGT-GO achieves superior results in cross-species learning and prediction of the functions of unseen proteins, further proving the method's strong generalization.

Keywords

References

  1. Bioinformatics. 2014 May 1;30(9):1236-40 [PMID: 24451626]
  2. Genome Biol. 2019 Nov 19;20(1):244 [PMID: 31744546]
  3. BMC Genomics. 2018 Sep 24;19(Suppl 7):671 [PMID: 30255791]
  4. Nucleic Acids Res. 2018 Jan 4;46(D1):D1282 [PMID: 29194501]
  5. Brief Bioinform. 2023 Sep 22;24(6): [PMID: 37861172]
  6. Bioinformatics. 2018 Jul 15;34(14):2465-2473 [PMID: 29522145]
  7. Neural Netw. 2025 Jan;181:106645 [PMID: 39395234]
  8. IEEE J Biomed Health Inform. 2024 Apr;28(4):2408-2415 [PMID: 38319781]
  9. Brief Bioinform. 2024 May 23;25(4): [PMID: 39003530]
  10. Nucleic Acids Res. 2019 Jul 2;47(W1):W379-W387 [PMID: 31106361]
  11. Bioinformatics. 2020 Jan 15;36(2):422-429 [PMID: 31350877]
  12. Bioinformatics. 2023 Jul 1;39(7): [PMID: 37369035]
  13. Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515 [PMID: 30395287]
  14. J Chem Inf Model. 2024 Apr 22;64(8):3149-3160 [PMID: 38587937]
  15. Bioinformatics. 2023 Oct 3;39(10): [PMID: 37847755]
  16. Bioinformatics. 2023 Mar 1;39(3): [PMID: 36883697]
  17. IEEE J Biomed Health Inform. 2022 Oct;26(10):4957-4965 [PMID: 35349463]
  18. Science. 2016 Sep 23;353(6306): [PMID: 27708008]
  19. Brief Bioinform. 2024 Jan 22;25(2): [PMID: 38446740]
  20. Bioinformatics. 2018 Nov 15;34(22):3873-3881 [PMID: 29868758]
  21. Genome Biol. 2016 Sep 07;17(1):184 [PMID: 27604469]
  22. Sci Rep. 2021 Feb 4;11(1):3198 [PMID: 33542326]
  23. Bioinformatics. 2023 Mar 1;39(3): [PMID: 36794913]
  24. Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85 [PMID: 26673716]
  25. Nucleic Acids Res. 2017 Jan 4;45(D1):D200-D203 [PMID: 27899674]
  26. Bioinformatics. 2021 Jul 12;37(Suppl_1):i262-i271 [PMID: 34252926]
  27. Brief Bioinform. 2023 May 19;24(3): [PMID: 36964722]

Grants

  1. 62302317/National Natural Science Foundation of China
  2. 2022B1212010005/Project of Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies
  3. 20220715183602001/Shenzhen Colleges and Universities Stable Support Program

MeSH Term

Artificial Intelligence
Proteins
Protein Interaction Mapping
Algorithms
Protein Interaction Maps
Computational Biology
Databases, Protein

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0PPIfunctionpredictionproteinproteinsmulti-hoprelationshipsnetworksfunctionalSEGT-GOGraphmethodneighborhoodartificialintelligenceserializationresultsfunctionspotentialdistantwithingraphnetworklayersperformanceTransformerbasedExplainablelarge-scaleinformationNetworkfeaturemodelachievessuperiortransformerBACKGROUND:massiveamountsequencesobtainedremainchallengingdiscernrecentresearchProtein-ProteinInteractionNetworksplayedcrucialroleUncoveringessentialimprovingaccuracycurrentstudiesattemptcapturestackinggainsdiminishnumberincreasesRESULTS:explorepaperproposesSerializationmultispeciesmapsserializedembeddingsenablinglearndeeperfeaturesBasedgametheorySHAPeXplainableArtificialIntelligenceXAIframeworkoptimizesinputfiltersnoiseenhancingCONCLUSIONS:ComparedadvancedDeepGraphGOcompetitivestandarddatasetssmallonesvalidatingabilityextractdeepFurthermorecross-specieslearningunseenprovingmethod'sstronggeneralizationSEGT-GO:explanatoryMulti-hopProtein

Similar Articles

Cited By

No available data.