Can large language models predict antimicrobial peptide activity and toxicity?

Markus Orsi, Jean-Louis Reymond
Author Information
  1. Markus Orsi: Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland jean-louis.reymond@unibe.ch. ORCID
  2. Jean-Louis Reymond: Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland jean-louis.reymond@unibe.ch. ORCID

Abstract

Antimicrobial peptides (AMPs) are naturally occurring or designed peptides up to a few tens of amino acids which may help address the antimicrobial resistance crisis. However, their clinical development is limited by toxicity to human cells, a parameter which is very difficult to control. Given the similarity between peptide sequences and words, large language models (LLMs) might be able to predict AMP activity and toxicity. To test this hypothesis, we fine-tuned LLMs using data from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). GPT-3 performed well but not reproducibly for activity prediction and hemolysis, taken as a proxy for toxicity. The later GPT-3.5 performed more poorly and was surpassed by recurrent neural networks (RNN) trained on sequence-activity data or support vector machines (SVM) trained on MAP4C molecular fingerprint-activity data. These simpler models are therefore recommended, although the rapid evolution of LLMs warrants future re-evaluation of their prediction abilities.

References

  1. J Chem Inf Model. 2023 Apr 24;63(8):2546-2553 [PMID: 37010950]
  2. J Cheminform. 2020 Jun 12;12(1):43 [PMID: 33431010]
  3. FEMS Microbiol Lett. 2014 Aug;357(1):63-8 [PMID: 24888447]
  4. J Cheminform. 2021 Jan 10;13(1):2 [PMID: 33423696]
  5. Neural Comput. 1997 Nov 15;9(8):1735-80 [PMID: 9377276]
  6. Nature. 1982 Sep 23;299(5881):371-4 [PMID: 7110359]
  7. Nat Chem Biol. 2023 Nov;19(11):1342-1350 [PMID: 37231267]
  8. Chem Sci. 2021 Jun 7;12(26):9221-9232 [PMID: 34349895]
  9. BMC Bioinformatics. 2019 Dec 23;20(1):730 [PMID: 31870282]
  10. Bioinformatics. 2018 Aug 15;34(16):2740-2747 [PMID: 29590297]
  11. J Mol Biol. 2019 Aug 23;431(18):3547-3567 [PMID: 30611750]
  12. Sci Rep. 2020 Oct 6;10(1):16581 [PMID: 33024236]
  13. Biomolecules. 2020 Sep 28;10(10): [PMID: 32998475]
  14. J Cheminform. 2020 Feb 12;12(1):12 [PMID: 33431043]
  15. Nature. 2023 Dec;624(7992):570-578 [PMID: 38123806]
  16. Angew Chem Int Ed Engl. 2018 Oct 26;57(44):14440-14475 [PMID: 29939462]
  17. J Comput Chem. 2018 Oct 5;39(26):2210-2216 [PMID: 30368831]
  18. Nat Biomed Eng. 2023 Jun;7(6):707-708 [PMID: 37095317]
  19. Digit Discov. 2023 Jan 26;2(2):368-376 [PMID: 37065678]
  20. J Chem Inf Model. 2023 Mar 27;63(6):1649-1655 [PMID: 36926868]
  21. Sci Rep. 2018 Jul 25;8(1):11189 [PMID: 30046138]
  22. Nat Mach Intell. 2024;6(5):525-535 [PMID: 38799228]
  23. Bioinformatics. 2020 Jun 1;36(11):3350-3356 [PMID: 32145017]
  24. J Cheminform. 2021 Oct 18;13(1):82 [PMID: 34663470]
  25. Pharmaceuticals (Basel). 2019 Jun 03;12(2): [PMID: 31163671]
  26. Nat Rev Drug Discov. 2020 May;19(5):311-332 [PMID: 32107480]
  27. J Cheminform. 2024 May 13;16(1):53 [PMID: 38741153]
  28. Sci Rep. 2024 May 25;14(1):11995 [PMID: 38796582]
  29. Digit Discov. 2023 Aug 8;2(5):1233-1250 [PMID: 38013906]
  30. Mol Ther Nucleic Acids. 2020 Jun 5;20:882-894 [PMID: 32464552]
  31. Lancet Infect Dis. 2020 Sep;20(9):e216-e230 [PMID: 32653070]
  32. Sci Rep. 2020 Jul 2;10(1):10869 [PMID: 32616760]
  33. ChemMedChem. 2022 Sep 5;17(17):e202200291 [PMID: 35880810]
  34. J Chem Inf Model. 2018 Feb 26;58(2):472-479 [PMID: 29355319]

Word Cloud

Created with Highcharts 10.0.0toxicitymodelsLLMsactivitydataAntimicrobialpeptidesantimicrobialpeptidelargelanguagepredictGPT-3performedpredictiontrainedAMPsnaturallyoccurringdesignedtensaminoacidsmayhelpaddressresistancecrisisHoweverclinicaldevelopmentlimitedhumancellsparameterdifficultcontrolGivensimilaritysequenceswordsmightableAMPtesthypothesisfine-tunedusingDatabaseActivityStructurePeptidesDBAASPwellreproduciblyhemolysistakenproxylater5poorlysurpassedrecurrentneuralnetworksRNNsequence-activitysupportvectormachinesSVMMAP4Cmolecularfingerprint-activitysimplerthereforerecommendedalthoughrapidevolutionwarrantsfuturere-evaluationabilitiesCantoxicity?

Similar Articles

Cited By