Annotating protein functions via fusing multiple biological modalities.

Wenjian Ma, Xiangpeng Bi, Huasen Jiang, Zhiqiang Wei, Shugang Zhang
Author Information
  1. Wenjian Ma: College of Computer Science and Technology, Ocean University of China, Qingdao, China. ORCID
  2. Xiangpeng Bi: College of Computer Science and Technology, Ocean University of China, Qingdao, China.
  3. Huasen Jiang: College of Computer Science and Technology, Ocean University of China, Qingdao, China.
  4. Zhiqiang Wei: College of Computer Science and Technology, Ocean University of China, Qingdao, China.
  5. Shugang Zhang: College of Computer Science and Technology, Ocean University of China, Qingdao, China. zsg@ouc.edu.cn. ORCID

Abstract

Understanding the function of proteins is of great significance for revealing disease pathogenesis and discovering new targets. Benefiting from the explosive growth of the protein universal, deep learning has been applied to accelerate the protein annotation cycle from different biological modalities. However, most existing deep learning-based methods not only fail to effectively fuse different biological modalities, resulting in low-quality protein representations, but also suffer from the convergence of suboptimal solution caused by sparse label representations. Aiming at the above issue, we propose a multiprocedural approach for fusing heterogeneous biological modalities and annotating protein functions, i.e., MIF2GO (Multimodal Information Fusion to infer Gene Ontology terms), which sequentially fuses up to six biological modalities ranging from different biological levels in three steps, thus leading to powerful protein representations. Evaluation results on seven benchmark datasets show that the proposed method not only considerably outperforms state-of-the-art performance, but also demonstrates great robustness and generalizability across species. Besides, we also present biological insights into the associations between those modalities and protein functions. This research provides a robust framework for integrating multimodal biological data, offering a scalable solution for protein function annotation, ultimately facilitating advancements in precision medicine and the discovery of novel therapeutic strategies.

References

  1. IEEE J Biomed Health Inform. 2023 Apr;27(4):2128-2137 [PMID: 37018115]
  2. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34882195]
  3. Structure. 2009 Jul 15;17(7):919-29 [PMID: 19604472]
  4. Bioinformatics. 2020 Dec 22;36(20):5109-5111 [PMID: 32692801]
  5. Nucleic Acids Res. 2023 Jan 6;51(D1):D638-D646 [PMID: 36370105]
  6. Nucleic Acids Res. 2024 Jan 5;52(D1):D672-D678 [PMID: 37941124]
  7. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W6-9 [PMID: 16845079]
  8. Cell Syst. 2016 Dec 21;3(6):540-548.e5 [PMID: 27889536]
  9. Nucleic Acids Res. 2021 Jul 2;49(W1):W469-W475 [PMID: 34038555]
  10. Nucleic Acids Res. 2019 Jul 2;47(W1):W379-W387 [PMID: 31106361]
  11. J Exp Med. 1996 Aug 1;184(2):609-18 [PMID: 8760814]
  12. J Biomed Inform. 2024 Aug;156:104672 [PMID: 38857738]
  13. Nat Biotechnol. 2024 Jun;42(6):975-985 [PMID: 37679542]
  14. Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531 [PMID: 36408920]
  15. Curr Opin Struct Biol. 2023 Apr;79:102538 [PMID: 36764042]
  16. Nat Protoc. 2010 Apr;5(4):725-38 [PMID: 20360767]
  17. Bioinformatics. 2020 Jan 15;36(2):422-429 [PMID: 31350877]
  18. Bioinformatics. 2023 Jul 1;39(7): [PMID: 37369035]
  19. Nat Rev Dis Primers. 2021 Feb 18;7(1):13 [PMID: 33602943]
  20. Gigascience. 2020 Aug 1;9(8): [PMID: 32770210]
  21. Nat Commun. 2021 May 26;12(1):3168 [PMID: 34039967]
  22. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  23. Nucleic Acids Res. 2019 Nov 18;47(20):e127 [PMID: 31504851]
  24. Bioinformatics. 2021 Sep 29;37(18):2825-2833 [PMID: 33755048]
  25. Bioinformatics. 2023 Mar 1;39(3): [PMID: 36883697]
  26. Int J Mol Sci. 2024 Mar 28;25(7): [PMID: 38612602]
  27. Drug Discov Today. 2022 Dec;27(12):103373 [PMID: 36167282]
  28. Hum Mol Genet. 2019 Feb 1;28(3):386-395 [PMID: 30256963]
  29. Bioinformatics. 2018 Nov 15;34(22):3873-3881 [PMID: 29868758]
  30. Comput Soc Netw. 2019;6(1):11 [PMID: 37915858]
  31. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15): [PMID: 33876751]
  32. Genetics. 2023 May 4;224(1): [PMID: 36866529]
  33. Cell Biosci. 2013 Nov 24;3(1):45 [PMID: 24268103]
  34. IEEE J Biomed Health Inform. 2021 May;25(5):1832-1838 [PMID: 32897865]
  35. J Chem Inf Model. 2022 Sep 12;62(17):4008-4017 [PMID: 36006049]
  36. Nucleic Acids Res. 2021 Dec 16;49(22):e129 [PMID: 34581805]
  37. Bioinformatics. 2021 Jul 12;37(Suppl_1):i262-i271 [PMID: 34252926]
  38. Nat Methods. 2013 Mar;10(3):221-7 [PMID: 23353650]
  39. Front Pharmacol. 2022 Jan 24;13:800885 [PMID: 35140614]

Grants

  1. NO. 62306293/National Natural Science Foundation of China (National Science Foundation of China)

MeSH Term

Humans
Molecular Sequence Annotation
Proteins
Deep Learning
Gene Ontology
Computational Biology
Animals
Databases, Protein

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0proteinbiologicalmodalitiesdifferentrepresentationsalsofunctionsfunctiongreatdeepannotationsolutionfusingUnderstandingproteinssignificancerevealingdiseasepathogenesisdiscoveringnewtargetsBenefitingexplosivegrowthuniversallearningappliedacceleratecycleHoweverexistinglearning-basedmethodsfaileffectivelyfuseresultinglow-qualitysufferconvergencesuboptimalcausedsparselabelAimingissueproposemultiproceduralapproachheterogeneousannotatingieMIF2GOMultimodalInformationFusioninferGeneOntologytermssequentiallyfusessixranginglevelsthreestepsthusleadingpowerfulEvaluationresultssevenbenchmarkdatasetsshowproposedmethodconsiderablyoutperformsstate-of-the-artperformancedemonstratesrobustnessgeneralizabilityacrossspeciesBesidespresentinsightsassociationsresearchprovidesrobustframeworkintegratingmultimodaldataofferingscalableultimatelyfacilitatingadvancementsprecisionmedicinediscoverynoveltherapeuticstrategiesAnnotatingviamultiple

Similar Articles

Cited By

No available data.