Conformational Space Profiling Enhances Generic Molecular Representation for AI-Powered Ligand-Based Drug Discovery.

Lin Wang, Shihang Wang, Hao Yang, Shiwei Li, Xinyu Wang, Yongqi Zhou, Siyuan Tian, Lu Liu, Fang Bai
Author Information
  1. Lin Wang: Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China. ORCID
  2. Shihang Wang: Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
  3. Hao Yang: Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
  4. Shiwei Li: Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
  5. Xinyu Wang: Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
  6. Yongqi Zhou: Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
  7. Siyuan Tian: Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
  8. Lu Liu: Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
  9. Fang Bai: Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, Information Science and Technology, Shanghai Tech University, Shanghai Clinical Research and Trial Center, Shanghai, 201210, China. ORCID

Abstract

The molecular representation model is a neural network that converts molecular representations (SMILES, Graph) into feature vectors, and is an essential module applied across a wide range of artificial intelligence-driven drug discovery scenarios. However, current molecular representation models rarely consider the three-dimensional conformational space of molecules, losing sight of the dynamic nature of small molecules as well as the essence of molecular conformational space that covers the heterogeneity of molecule properties, such as the multi-target mechanism of action, recognition of different biomolecules, dynamics in cytoplasm and membrane. In this study, a new model named GeminiMol is proposed to incorporate conformational space profiles into molecular representation learning, which extracts the feature of capturing the complicated interplay between the molecular structure and the conformational space. Although GeminiMol is pre-trained on a relatively small-scale molecular dataset (39290 molecules), it shows balanced and superior performance not only on 67 molecular properties predictions but also on 73 cellular activity predictions and 171 zero-shot tasks (including virtual screening and target identification). By capturing the molecular conformational space profile, the strategy paves the way for rapid exploration of chemical space and facilitates changing paradigms for drug design.

Keywords

References

  1. Expert Opin Drug Discov. 2016;11(2):137-48 [PMID: 26558489]
  2. J Cheminform. 2023 Nov 2;15(1):103 [PMID: 37919805]
  3. Expert Opin Drug Discov. 2013 Mar;8(3):245-61 [PMID: 23330660]
  4. J Cheminform. 2020 Jun 12;12(1):43 [PMID: 33431010]
  5. Front Chem. 2018 Jul 25;6:315 [PMID: 30090808]
  6. Drug Discov Today. 2022 Nov;27(11):103356 [PMID: 36113834]
  7. J Med Chem. 2005 Jun 16;48(12):4111-9 [PMID: 15943484]
  8. Brief Bioinform. 2022 Nov 19;23(6): [PMID: 36124766]
  9. J Chem Inf Model. 2022 Jun 13;62(11):2713-2725 [PMID: 35638560]
  10. J Mol Biol. 1996 Jan 19;255(2):321-46 [PMID: 8551523]
  11. ACS Omega. 2021 Oct 05;6(41):27233-27238 [PMID: 34693143]
  12. Molecules. 2022 Jun 19;27(12): [PMID: 35745053]
  13. Acta Crystallogr D Biol Crystallogr. 2000 Mar;56(Pt 3):294-303 [PMID: 10713516]
  14. JACS Au. 2024 Apr 01;4(4):1632-1645 [PMID: 38665669]
  15. Bioorg Med Chem Lett. 2004 Jul 16;14(14):3727-31 [PMID: 15203151]
  16. J Med Chem. 2009 Dec 10;52(23):7604-17 [PMID: 19954246]
  17. Natl Sci Rev. 2023 Nov 28;11(3):nwad303 [PMID: 38440073]
  18. Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380 [PMID: 36305812]
  19. Bioinformatics. 2022 Jun 27;38(13):3444-3453 [PMID: 35604079]
  20. Bioinformatics. 2015 Feb 1;31(3):405-12 [PMID: 25301850]
  21. J Med Chem. 2012 Jul 26;55(14):6582-94 [PMID: 22716043]
  22. J Chem Inf Model. 2024 Apr 8;64(7):2205-2220 [PMID: 37319418]
  23. Adv Sci (Weinh). 2024 Oct;11(40):e2403998 [PMID: 39206753]
  24. J Med Chem. 2023 Sep 28;66(18):12651-12677 [PMID: 37672650]
  25. Bioinformatics. 2022 Jan 1;39(6): [PMID: 37289553]
  26. J Chem Inf Model. 2011 Oct 24;51(10):2455-66 [PMID: 21870862]
  27. Chem Sci. 2017 Oct 31;9(2):513-530 [PMID: 29629118]
  28. Pharmaceuticals (Basel). 2022 May 23;15(5): [PMID: 35631472]
  29. J Chem Inf Model. 2019 Feb 25;59(2):895-913 [PMID: 30481020]
  30. J Chem Inf Model. 2011 Oct 24;51(10):2650-65 [PMID: 21774552]
  31. J Med Chem. 2005 Mar 10;48(5):1489-95 [PMID: 15743191]
  32. J Comput Aided Mol Des. 1992 Dec;6(6):607-28 [PMID: 1291629]
  33. J Chem Inf Model. 2020 Sep 28;60(9):4263-4273 [PMID: 32282202]
  34. Adv Neural Inf Process Syst. 2022;35:32039-32052 [PMID: 37994346]
  35. Nucleic Acids Res. 2016 Jan 4;44(D1):D1045-53 [PMID: 26481362]
  36. SAR QSAR Environ Res. 1998;9(1-2):23-38 [PMID: 9517013]
  37. Leukemia. 2002 Apr;16(4):520-6 [PMID: 11960328]
  38. Nat Commun. 2023 Nov 21;14(1):7568 [PMID: 37989998]
  39. J Chem Inf Model. 2012 Jan 23;52(1):1-6 [PMID: 22168315]
  40. Structure. 1995 Jun 15;3(6):581-90 [PMID: 8590019]
  41. J Chem Inf Model. 2021 Mar 22;61(3):1180-1192 [PMID: 33630603]
  42. Drug Discov Today. 2010 Jun;15(11-12):444-50 [PMID: 20362693]
  43. J Med Chem. 2004 May 6;47(10):2499-510 [PMID: 15115393]
  44. J Chem Inf Model. 2007 Mar-Apr;47(2):488-508 [PMID: 17288412]
  45. J Mol Biol. 2001 Oct 26;313(3):593-614 [PMID: 11676542]
  46. J Chem Inf Model. 2010 May 24;50(5):742-54 [PMID: 20426451]
  47. Nucleic Acids Res. 2007 Jan;35(Database issue):D198-201 [PMID: 17145705]
  48. J Chem Inf Model. 2011 Sep 26;51(9):2372-85 [PMID: 21819157]
  49. J Chem Inf Model. 2020 Sep 28;60(9):4326-4338 [PMID: 32639159]
  50. Nat Commun. 2023 Jul 14;14(1):4217 [PMID: 37452028]
  51. Proc Natl Acad Sci U S A. 2005 Mar 8;102(10):3593-8 [PMID: 15728727]
  52. Nat Protoc. 2018 Apr;13(4):666-680 [PMID: 29517771]
  53. BMC Bioinformatics. 2010 Nov 04;11:545 [PMID: 21050454]
  54. Protein Sci. 2007 May;16(5):897-905 [PMID: 17456742]
  55. Bioinformatics. 2024 Jan 2;40(1): [PMID: 38141210]

Grants

  1. 2022YFC3400501/National Key R&D Program of China
  2. 2022YFC3400500/National Key R&D Program of China
  3. 2020YFA0509700/National Key R&D Program of China
  4. 82341093/National Natural Science Foundation of China
  5. 82003654/National Natural Science Foundation of China
  6. /Start-up package from ShanghaiTech University, and Shanghai Frontiers Science Center for Biomacromolecules and Precision Medicine at ShanghaiTech University
  7. 22ZR1441400/Shanghai Science and Technology Development Foundation
  8. 20QA1406400/Shanghai Science and Technology Development Foundation

MeSH Term

Drug Discovery
Ligands
Artificial Intelligence
Molecular Conformation
Models, Molecular
Neural Networks, Computer

Chemicals

Ligands

Word Cloud

Created with Highcharts 10.0.0molecularspaceconformationalrepresentationdrugmoleculesmodelfeaturediscoverypropertiesGeminiMollearningcapturingpredictionsactivityvirtualscreeningneuralnetworkconvertsrepresentationsSMILESGraphvectorsessentialmoduleappliedacrosswiderangeartificialintelligence-drivenscenariosHowevercurrentmodelsrarelyconsiderthree-dimensionallosingsightdynamicnaturesmallwellessencecoversheterogeneitymoleculemulti-targetmechanismactionrecognitiondifferentbiomoleculesdynamicscytoplasmmembranestudynewnamedproposedincorporateprofilesextractscomplicatedinterplaystructureAlthoughpre-trainedrelativelysmall-scaledataset39290showsbalancedsuperiorperformance67also73cellular171zero-shottasksincludingtargetidentificationprofilestrategypaveswayrapidexplorationchemicalfacilitateschangingparadigmsdesignConformationalSpaceProfilingEnhancesGenericMolecularRepresentationAI-PoweredLigand-BasedDrugDiscoverycellular‐levelmodelingligand‐based

Similar Articles

Cited By