MEGA-GO: functions prediction of diverse protein sequence length using Multi-scalE Graph Adaptive neural network.

Yujian Lee, Peng Gao, Yongqi Xu, Ziyang Wang, Shuaicheng Li, Jiaxing Chen
Author Information
  1. Yujian Lee: Guangdong Provincial Key Laboratory IRADS, Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai 519087, China. ORCID
  2. Peng Gao: Department of Computer Science, Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai 519087, China. ORCID
  3. Yongqi Xu: Department of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510520, China. ORCID
  4. Ziyang Wang: Department of Science of Chinese Materia Medica, Guangdong Medical University, Dongguan 524023, China. ORCID
  5. Shuaicheng Li: Department of Computer Science, City University of Hong Kong, Hong Kong, China. ORCID
  6. Jiaxing Chen: Guangdong Provincial Key Laboratory IRADS, Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai 519087, China. ORCID

Abstract

MOTIVATION: The increasing accessibility of large-scale protein sequences through advanced sequencing technologies has necessitated the development of efficient and accurate methods for predicting protein function. Computational prediction models have emerged as a promising solution to expedite the annotation process. However, despite making significant progress in protein research, graph neural networks face challenges in capturing long-range structural correlations and identifying critical residues in protein graphs. Furthermore, existing models have limitations in effectively predicting the function of newly sequenced proteins that are not included in protein interaction networks. This highlights the need for novel approaches integrating protein structure and sequence data.
RESULTS: We introduce Multi-scalE Graph Adaptive neural network (MEGA-GO), highlighting the capability of capturing diverse protein sequence length features from multiple scales. The unique graph adaptive neural network architecture of MEGA-GO enables a more nuanced extraction of graph structure features, effectively capturing intricate relationships within biological data. Experimental results demonstrate that MEGA-GO outperforms mainstream protein function prediction models in the accuracy of Gene Ontology term classification, yielding 33.4%, 68.9%, and 44.6% of area under the precision-recall curve on biological process, molecular function, and cellular component domains, respectively. The rest of the experimental results reveal that our model consistently surpasses the state-of-the-art methods.
AVAILABILITY AND IMPLEMENTATION: The source code and data of MEGA-GO are available at https://github.com/Cheliosoops/MEGA-GO.

References

  1. Methods Mol Biol. 2016;1374:23-54 [PMID: 26519399]
  2. Bioinformatics. 2015 Nov 1;31(21):3460-7 [PMID: 26139634]
  3. Adv Neural Inf Process Syst. 2018 Dec;32:8792-8802 [PMID: 39839708]
  4. Nucleic Acids Res. 2000 Jan 1;28(1):235-42 [PMID: 10592235]
  5. BMC Bioinformatics. 2019 Dec 17;20(1):723 [PMID: 31847804]
  6. J Mol Biol. 2024 May 1;436(9):168543 [PMID: 38508302]
  7. KDD. 2016 Aug;2016:855-864 [PMID: 27853626]
  8. Nat Methods. 2015 Jan;12(1):59-60 [PMID: 25402007]
  9. Nat Rev Mol Cell Biol. 2019 Nov;20(11):659-660 [PMID: 31548714]
  10. Gigascience. 2015 Sep 14;4:41 [PMID: 26380075]
  11. Electrophoresis. 1997 Dec;18(15):2714-23 [PMID: 9504803]
  12. Bioinformatics. 2023 Jul 1;39(7): [PMID: 37369035]
  13. Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325 [PMID: 37387145]
  14. Annu Rev Biochem. 1995;64:287-314 [PMID: 7574483]
  15. Nat Commun. 2021 May 26;12(1):3168 [PMID: 34039967]
  16. Nat Rev Mol Cell Biol. 2020 Sep;21(9):501-521 [PMID: 32424334]
  17. Sensors (Basel). 2021 Sep 10;21(18): [PMID: 34577277]
  18. BMC Res Notes. 2012 Feb 01;5:85 [PMID: 22296664]
  19. Bioinformatics. 2018 Feb 15;34(4):660-668 [PMID: 29028931]
  20. Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444 [PMID: 34791371]
  21. Proteins. 2001 Jan 1;42(1):38-48 [PMID: 11093259]
  22. Bioinformatics. 2023 Oct 3;39(10): [PMID: 37847755]
  23. PLoS Comput Biol. 2017 Jan 5;13(1):e1005324 [PMID: 28056090]
  24. Syst Zool. 1970 Jun;19(2):99-113 [PMID: 5449325]
  25. Nucleic Acids Res. 2019 Jan 8;47(D1):D482-D489 [PMID: 30445541]

Grants

  1. 32200526/National Natural Science Foundation of China
  2. 2022KTSCX152/Guangdong Provincial Department of Education
  3. 2022B1212010006/Key Laboratory IRADS, Guangdong Province

MeSH Term

Neural Networks, Computer
Proteins
Computational Biology
Sequence Analysis, Protein
Algorithms
Gene Ontology
Amino Acid Sequence
Databases, Protein

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0proteinfunctionneuralMEGA-GOpredictionmodelsgraphcapturingsequencedatanetworkmethodspredictingprocessnetworkseffectivelystructureMulti-scalEGraphAdaptivediverselengthfeaturesbiologicalresultsMOTIVATION:increasingaccessibilitylarge-scalesequencesadvancedsequencingtechnologiesnecessitateddevelopmentefficientaccurateComputationalemergedpromisingsolutionexpediteannotationHoweverdespitemakingsignificantprogressresearchfacechallengeslong-rangestructuralcorrelationsidentifyingcriticalresiduesgraphsFurthermoreexistinglimitationsnewlysequencedproteinsincludedinteractionhighlightsneednovelapproachesintegratingRESULTS:introducehighlightingcapabilitymultiplescalesuniqueadaptivearchitectureenablesnuancedextractionintricaterelationshipswithinExperimentaldemonstrateoutperformsmainstreamaccuracyGeneOntologytermclassificationyielding334%689%446%areaprecision-recallcurvemolecularcellularcomponentdomainsrespectivelyrestexperimentalrevealmodelconsistentlysurpassesstate-of-the-artAVAILABILITYANDIMPLEMENTATION:sourcecodeavailablehttps://githubcom/Cheliosoops/MEGA-GOMEGA-GO:functionsusing

Similar Articles

Cited By

No available data.