GGN-GO: geometric graph networks for predicting protein function by multi-scale structure features.

Jia Mi, Han Wang, Jing Li, Jinghong Sun, Chang Li, Jing Wan, Yuan Zeng, Jingyang Gao
Author Information
  1. Jia Mi: The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing. ORCID
  2. Han Wang: The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing.
  3. Jing Li: The College of Life Science and Technology, Beijing University of Chemical Technology, Beijing.
  4. Jinghong Sun: The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing.
  5. Chang Li: The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing.
  6. Jing Wan: The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing.
  7. Yuan Zeng: Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences.
  8. Jingyang Gao: The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing.

Abstract

Recent advances in high-throughput sequencing have led to an explosion of genomic and transcriptomic data, offering a wealth of protein sequence information. However, the functions of most proteins remain unannotated. Traditional experimental methods for annotation of protein functions are costly and time-consuming. Current deep learning methods typically rely on Graph Convolutional Networks to propagate features between protein residues. However, these methods fail to capture fine atomic-level geometric structural features and cannot directly compute or propagate structural features (such as distances, directions, and angles) when transmitting features, often simplifying them to scalars. Additionally, difficulties in capturing long-range dependencies limit the model's ability to identify key nodes (residues). To address these challenges, we propose a geometric graph network (GGN-GO) for predicting protein function that enriches feature extraction by capturing multi-scale geometric structural features at the atomic and residue levels. We use a geometric vector perceptron to convert these features into vector representations and aggregate them with node features for better understanding and propagation in the network. Moreover, we introduce a graph attention pooling layer captures key node information by adaptively aggregating local functional motifs, while contrastive learning enhances graph representation discriminability through random noise and different views. The experimental results show that GGN-GO outperforms six comparative methods in tasks with the most labels for both experimentally validated and predicted protein structures. Furthermore, GGN-GO identifies functional residues corresponding to those experimentally confirmed, showcasing its interpretability and the ability to pinpoint key protein regions. The code and data are available at: https://github.com/MiJia-ID/GGN-GO.

Keywords

References

  1. Nucleic Acids Res. 2013 Jan;41(Database issue):D1096-103 [PMID: 23087378]
  2. Bioinformatics. 2015 Nov 1;31(21):3460-7 [PMID: 26139634]
  3. Brief Bioinform. 2023 Sep 22;24(6): [PMID: 37824738]
  4. Nat Commun. 2021 May 26;12(1):3168 [PMID: 34039967]
  5. Nature. 2024 Jun;630(8016):493-500 [PMID: 38718835]
  6. Nat Methods. 2015 Jan;12(1):59-60 [PMID: 25402007]
  7. Nucleic Acids Res. 2019 Jan 8;47(D1):D482-D489 [PMID: 30445541]
  8. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15): [PMID: 33876751]
  9. Science. 2023 Mar 31;379(6639):1358-1363 [PMID: 36996195]
  10. Nat Commun. 2023 Nov 14;14(1):7370 [PMID: 37963869]
  11. BMC Bioinformatics. 2023 May 8;24(1):188 [PMID: 37158823]
  12. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34882195]
  13. J Chem Inf Model. 2024 May 13;64(9):3650-3661 [PMID: 38630581]
  14. Bioinformatics. 2020 Jan 15;36(2):422-429 [PMID: 31350877]
  15. Bioinformatics. 2023 Jul 1;39(7): [PMID: 37369035]
  16. Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325 [PMID: 37387145]
  17. Neural Netw. 1998 Jun;11(4):761-767 [PMID: 12662814]
  18. Bioinformatics. 2018 Feb 15;34(4):660-668 [PMID: 29028931]
  19. Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444 [PMID: 34791371]
  20. IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127 [PMID: 34232869]
  21. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W6-9 [PMID: 16845079]
  22. Bioinformatics. 2023 Mar 1;39(3): [PMID: 36794913]
  23. Nucleic Acids Res. 2021 Jul 2;49(W1):W469-W475 [PMID: 34038555]
  24. Biopolymers. 1983 Dec;22(12):2577-637 [PMID: 6667333]
  25. Nucleic Acids Res. 2018 Jul 2;46(W1):W296-W303 [PMID: 29788355]
  26. Nat Methods. 2013 Mar;10(3):221-7 [PMID: 23353650]
  27. Brief Bioinform. 2023 May 19;24(3): [PMID: 36964722]

MeSH Term

Proteins
Computational Biology
Neural Networks, Computer
Algorithms
Deep Learning
Databases, Protein

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0featuresproteingraphgeometricmethodsstructurallearningresidueskeyGGN-GOfunctionmulti-scaledatainformationHoweverfunctionsexperimentalpropagatecapturingabilitynetworkpredictingvectornodeattentionpoolingfunctionalcontrastiveexperimentallynetworksRecentadvanceshigh-throughputsequencingledexplosiongenomictranscriptomicofferingwealthsequenceproteinsremainunannotatedTraditionalannotationcostlytime-consumingCurrentdeeptypicallyrelyGraphConvolutionalNetworksfailcapturefineatomic-leveldirectlycomputedistancesdirectionsanglestransmittingoftensimplifyingscalarsAdditionallydifficultieslong-rangedependencieslimitmodel'sidentifynodesaddresschallengesproposeenrichesfeatureextractionatomicresiduelevelsuseperceptronconvertrepresentationsaggregatebetterunderstandingpropagationMoreoverintroducelayercapturesadaptivelyaggregatinglocalmotifsenhancesrepresentationdiscriminabilityrandomnoisedifferentviewsresultsshowoutperformssixcomparativetaskslabelsvalidatedpredictedstructuresFurthermoreidentifiescorrespondingconfirmedshowcasinginterpretabilitypinpointregionscodeavailableat:https://githubcom/MiJia-ID/GGN-GOGGN-GO:structureprediction

Similar Articles

Cited By