A multimodal model for protein function prediction.

Yu Mao, WenHui Xu, Yue Shun, LongXin Chai, Lei Xue, Yong Yang, Mei Li
Author Information
  1. Yu Mao: State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan, 430062, Hubei, China.
  2. WenHui Xu: State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan, 430062, Hubei, China.
  3. Yue Shun: State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan, 430062, Hubei, China.
  4. LongXin Chai: State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan, 430062, Hubei, China.
  5. Lei Xue: State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan, 430062, Hubei, China.
  6. Yong Yang: State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan, 430062, Hubei, China. yangyong@hubu.edu.cn.
  7. Mei Li: State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan, 430062, Hubei, China. meili@hubu.edu.cn.

Abstract

Protein function, which is determined by sequence, structure, and other characteristics, plays a crucial role in an organism's performance. Existing protein function prediction methods mainly rely on sequence data and often ignore structural properties that are crucial for accurate prediction. Protein structure provides richer spatial and functional insights, which can significantly improve prediction accuracy. In this work, we propose a multi-modal protein function prediction model (MMPFP) that integrates protein sequence and structure information through the use of GCN, CNN, and Transformer models. We validate the model using the PDBest dataset, demonstrating that MMPFP outperforms traditional single-modal models in the molecular function (MF), biological process (BP), and cellular component (CC) prediction tasks. Specifically, MMPFP achieved AUPR scores of 0.693, 0.355, and 0.478; [Formula: see text] scores of 0.752, 0.629, and 0.691; and [Formula: see text] scores of 0.336, 0.488, and 0.459, showing a 3-5% improvement over single-modal models. Additionally, ablation studies confirm the effectiveness of the Transformer module within the GCN branch, further validating MMPFP's superior performance over existing methods. This multi-modal approach offers a more accurate and comprehensive framework for protein function prediction, addressing key limitations of current models.

Keywords

References

  1. IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1648-1659 [PMID: 30998479]
  2. Proteomics. 2019 Jul;19(14):e1900119 [PMID: 31187588]
  3. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34882195]
  4. Genome Biol. 2000;1(5):REVIEWS0005 [PMID: 11178260]
  5. Proteins. 2020 Mar;88(3):397-413 [PMID: 31603244]
  6. Brief Funct Genomics. 2021 Mar 2;20(1):61-73 [PMID: 33527980]
  7. PLoS One. 2018 Jun 11;13(6):e0198216 [PMID: 29889900]
  8. Bioinformatics. 2021 Sep 29;37(18):2825-2833 [PMID: 33755048]
  9. Nucleic Acids Res. 2019 Jan 8;47(D1):D482-D489 [PMID: 30445541]
  10. MethodsX. 2022 Jan 15;9:101622 [PMID: 35111575]
  11. Bioinformatics. 2020 Jan 15;36(2):422-429 [PMID: 31350877]
  12. Bioinformatics. 2023 Jul 1;39(7): [PMID: 37369035]
  13. Nat Commun. 2021 May 26;12(1):3168 [PMID: 34039967]
  14. IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1918-1931 [PMID: 30998480]
  15. Nature. 2021 Aug;596(7873):583-589 [PMID: 34265844]
  16. PLoS One. 2015 Nov 10;10(11):e0141287 [PMID: 26555596]
  17. Bioinformatics. 2018 Feb 15;34(4):660-668 [PMID: 29028931]
  18. PLoS Comput Biol. 2022 Dec 22;18(12):e1010793 [PMID: 36548439]
  19. IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2291-2301 [PMID: 37027658]
  20. Bioinformatics. 2024 Oct 1;40(10): [PMID: 39312678]
  21. IEEE J Biomed Health Inform. 2021 May;25(5):1832-1838 [PMID: 32897865]
  22. Comput Biol Med. 2022 Oct;149:105938 [PMID: 36070657]
  23. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  24. Brief Bioinform. 2006 Sep;7(3):225-42 [PMID: 16772267]
  25. IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):94-105 [PMID: 34826296]

Grants

  1. 235802001002/Hubei Provincial Talent Project

MeSH Term

Proteins
Computational Biology
Databases, Protein
Algorithms
Protein Conformation
Neural Networks, Computer

Chemicals

Proteins

Word Cloud

Similar Articles

Cited By