Hierarchical graph transformer with contrastive learning for protein function prediction.

Zhonghui Gu, Xiao Luo, Jiaxiao Chen, Minghua Deng, Luhua Lai
Author Information
  1. Zhonghui Gu: Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China. ORCID
  2. Xiao Luo: Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90024, United States.
  3. Jiaxiao Chen: Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China.
  4. Minghua Deng: Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China.
  5. Luhua Lai: Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China. ORCID

Abstract

MOTIVATION: In recent years, high-throughput sequencing technologies have made large-scale protein sequences accessible. However, their functional annotations usually rely on low-throughput and pricey experimental studies. Computational prediction models offer a promising alternative to accelerate this process. Graph neural networks have shown significant progress in protein research, but capturing long-distance structural correlations and identifying key residues in protein graphs remains challenging.
RESULTS: In the present study, we propose a novel deep learning model named Hierarchical graph transformEr with contrAstive Learning (HEAL) for protein function prediction. The core feature of HEAL is its ability to capture structural semantics using a hierarchical graph Transformer, which introduces a range of super-nodes mimicking functional motifs to interact with nodes in the protein graph. These semantic-aware super-node embeddings are then aggregated with varying emphasis to produce a graph representation. To optimize the network, we utilized graph contrastive learning as a regularization technique to maximize the similarity between different views of the graph representation. Evaluation of the PDBch test set shows that HEAL-PDB, trained on fewer data, achieves comparable performance to the recent state-of-the-art methods, such as DeepFRI. Moreover, HEAL, with the added benefit of unresolved protein structures predicted by AlphaFold2, outperforms DeepFRI by a significant margin on Fmax, AUPR, and Smin metrics on PDBch test set. Additionally, when there are no experimentally resolved structures available for the proteins of interest, HEAL can still achieve better performance on AFch test set than DeepFRI and DeepGOPlus by taking advantage of AlphaFold2 predicted structures. Finally, HEAL is capable of finding functional sites through class activation mapping.
AVAILABILITY AND IMPLEMENTATION: Implementations of our HEAL can be found at https://github.com/ZhonghuiGu/HEAL.

References

  1. Bioinformatics. 2015 Nov 1;31(21):3460-7 [PMID: 26139634]
  2. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34882195]
  3. Science. 2021 Aug 20;373(6557):871-876 [PMID: 34282049]
  4. Nat Commun. 2021 May 26;12(1):3168 [PMID: 34039967]
  5. PLoS One. 2018 Jun 11;13(6):e0198216 [PMID: 29889900]
  6. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D115-9 [PMID: 14681372]
  7. Cell Syst. 2016 Dec 21;3(6):540-548.e5 [PMID: 27889536]
  8. Nucleic Acids Res. 2019 Jan 8;47(D1):D482-D489 [PMID: 30445541]
  9. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15): [PMID: 33876751]
  10. Science. 2023 Mar 17;379(6637):1123-1130 [PMID: 36927031]
  11. Nucleic Acids Res. 2013 Jan;41(Database issue):D1096-103 [PMID: 23087378]
  12. Genome Biol. 2019 Nov 19;20(1):244 [PMID: 31744546]
  13. Proteomics. 2019 Jun;19(12):e1900019 [PMID: 30941889]
  14. NAR Genom Bioinform. 2022 Feb 02;4(1):lqac004 [PMID: 35118378]
  15. Nucleic Acids Res. 2000 Jan 1;28(1):235-42 [PMID: 10592235]
  16. Nat Methods. 2015 Jan;12(1):7-8 [PMID: 25549265]
  17. Adv Neural Inf Process Syst. 2022 Dec;35:1909-1922 [PMID: 37192934]
  18. Nature. 2021 Aug;596(7873):583-589 [PMID: 34265844]
  19. Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515 [PMID: 30395287]
  20. Bioinformatics. 2018 Feb 15;34(4):660-668 [PMID: 29028931]
  21. Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444 [PMID: 34791371]
  22. Genomics Proteomics Bioinformatics. 2023 Apr 17;: [PMID: 37075830]
  23. Bioinformatics. 2021 Sep 29;37(18):2825-2833 [PMID: 33755048]
  24. Nucleic Acids Res. 2018 Jul 2;46(W1):W296-W303 [PMID: 29788355]
  25. Bioinformatics. 2021 May 23;37(8):1187 [PMID: 34009304]
  26. Nucleic Acids Res. 2021 Jul 2;49(W1):W469-W475 [PMID: 34038555]
  27. PLoS Comput Biol. 2022 Dec 22;18(12):e1010793 [PMID: 36548439]
  28. Proc Natl Acad Sci U S A. 2021 Nov 30;118(48): [PMID: 34815338]
  29. Nat Methods. 2015 Jan;12(1):59-60 [PMID: 25402007]
  30. Nat Commun. 2018 Jun 29;9(1):2542 [PMID: 29959318]
  31. Genome Biol. 2008;9 Suppl 1:S4 [PMID: 18613948]
  32. Bioinformatics. 2021 Jul 12;37(Suppl_1):i262-i271 [PMID: 34252926]
  33. Nat Methods. 2013 Mar;10(3):221-7 [PMID: 23353650]
  34. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]

MeSH Term

Amino Acid Sequence
Benchmarking
High-Throughput Nucleotide Sequencing
Neural Networks, Computer
Semantics

Word Cloud

Created with Highcharts 10.0.0proteingraphHEALfunctionalpredictionlearningtestsetDeepFRIstructuresrecentsignificantstructuralHierarchicalfunctionrepresentationcontrastivePDBchperformancepredictedAlphaFold2canMOTIVATION:yearshigh-throughputsequencingtechnologiesmadelarge-scalesequencesaccessibleHoweverannotationsusuallyrelylow-throughputpriceyexperimentalstudiesComputationalmodelsofferpromisingalternativeaccelerateprocessGraphneuralnetworksshownprogressresearchcapturinglong-distancecorrelationsidentifyingkeyresiduesgraphsremainschallengingRESULTS:presentstudyproposenoveldeepmodelnamedtransformErcontrAstiveLearningcorefeatureabilitycapturesemanticsusinghierarchicalTransformerintroducesrangesuper-nodesmimickingmotifsinteractnodessemantic-awaresuper-nodeembeddingsaggregatedvaryingemphasisproduceoptimizenetworkutilizedregularizationtechniquemaximizesimilaritydifferentviewsEvaluationshowsHEAL-PDBtrainedfewerdataachievescomparablestate-of-the-artmethodsMoreoveraddedbenefitunresolvedoutperformsmarginFmaxAUPRSminmetricsAdditionallyexperimentallyresolvedavailableproteinsintereststillachievebetterAFchDeepGOPlustakingadvantageFinallycapablefindingsitesclassactivationmappingAVAILABILITYANDIMPLEMENTATION:Implementationsfoundhttps://githubcom/ZhonghuiGu/HEALtransformer

Similar Articles

Cited By (18)