Target-driven machine learning-enabled virtual screening (TAME-VS) platform for early-stage hit identification.

Yuemin Bian, Jason J Kwon, Cong Liu, Enrico Margiotta, Mrinal Shekhar, Alexandra E Gould
Author Information
  1. Yuemin Bian: Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, United States.
  2. Jason J Kwon: Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States.
  3. Cong Liu: Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, United States.
  4. Enrico Margiotta: Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, United States.
  5. Mrinal Shekhar: Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, United States.
  6. Alexandra E Gould: Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, United States.

Abstract

High-throughput screening (HTS) methods enable the empirical evaluation of a large scale of compounds and can be augmented by virtual screening (VS) techniques to save time and money by using potential active compounds for experimental testing. Structure-based and ligand-based virtual screening approaches have been extensively studied and applied in drug discovery practice with proven outcomes in advancing candidate molecules. However, the experimental data required for VS are expensive, and hit identification in an effective and efficient manner is particularly challenging during early-stage drug discovery for novel protein targets. Herein, we present our TArget-driven Machine learning-Enabled VS (TAME-VS) platform, which leverages existing chemical databases of bioactive molecules to modularly facilitate hit finding. Our methodology enables bespoke hit identification campaigns through a user-defined protein target. The input target ID is used to perform a homology-based target expansion, followed by compound retrieval from a large compilation of molecules with experimentally validated activity. Compounds are subsequently vectorized and adopted for machine learning (ML) model training. These machine learning models are deployed to perform model-based inferential virtual screening, and compounds are nominated based on predicted activity. Our platform was retrospectively validated across ten diverse protein targets and demonstrated clear predictive power. The implemented methodology provides a flexible and efficient approach that is accessible to a wide range of users. The TAME-VS platform is publicly available at https://github.com/bymgood/Target-driven-ML-enabled-VS to facilitate early-stage hit identification.

Keywords

References

  1. Lab Invest. 2012 Dec;92(12):1749-59 [PMID: 23044923]
  2. Nat Rev Drug Discov. 2019 Jun;18(6):463-477 [PMID: 30976107]
  3. Nature. 2019 Feb;566(7743):224-229 [PMID: 30728502]
  4. Nature. 2021 Dec;600(7890):759-764 [PMID: 34880501]
  5. Mol Pharm. 2019 Jun 3;16(6):2605-2615 [PMID: 31013097]
  6. Nucleic Acids Res. 2021 Jul 2;49(W1):W5-W14 [PMID: 33893803]
  7. AAPS J. 2018 Mar 30;20(3):58 [PMID: 29603063]
  8. J Invest Dermatol. 2015 Oct;135(10):2377-2384 [PMID: 25927164]
  9. Bioorg Med Chem. 2016 Dec 1;24(23):6149-6165 [PMID: 27825552]
  10. Am J Pathol. 2001 Feb;158(2):723-34 [PMID: 11159210]
  11. Nat Rev Drug Discov. 2022 Dec;21(12):915-931 [PMID: 36195754]
  12. Bioinformatics. 2009 Jun 1;25(11):1422-3 [PMID: 19304878]
  13. ACS Cent Sci. 2020 Jun 24;6(6):939-949 [PMID: 32607441]
  14. J Cheminform. 2021 Feb 17;13(1):12 [PMID: 33597034]
  15. Nature. 2020 Apr;580(7805):663-668 [PMID: 32152607]
  16. Chem Sci. 2021 Apr 29;12(22):7866-7881 [PMID: 34168840]
  17. Nature. 2021 Aug;596(7873):583-589 [PMID: 34265844]
  18. Adv Drug Deliv Rev. 2001 Mar 1;46(1-3):3-26 [PMID: 11259830]
  19. J Med Chem. 2020 Aug 27;63(16):8835-8848 [PMID: 32286824]
  20. J Chem Inf Model. 2019 Aug 26;59(8):3370-3388 [PMID: 31361484]
  21. Trends Biotechnol. 2020 Aug;38(8):888-906 [PMID: 32005372]
  22. Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940 [PMID: 30398643]
  23. Cell. 2020 Feb 20;180(4):688-702.e13 [PMID: 32084340]
  24. J Mol Model. 2021 Feb 4;27(3):71 [PMID: 33543405]
  25. Nat Struct Mol Biol. 2022 Nov;29(11):1056-1067 [PMID: 36344848]
  26. Nat Rev Drug Discov. 2019 Jan;18(1):59-82 [PMID: 30410121]
  27. Chem Rev. 2019 Aug 28;119(16):9478-9508 [PMID: 31244000]
  28. Nat Chem. 2012 Jan 24;4(2):90-8 [PMID: 22270643]
  29. Am J Pathol. 1998 Apr;152(4):1005-14 [PMID: 9546361]
  30. Mol Pharm. 2019 Nov 4;16(11):4451-4460 [PMID: 31589460]
  31. Cells. 2022 Mar 07;11(5): [PMID: 35269537]
  32. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  33. Mol Biol Cell. 2004 Dec;15(12):5242-54 [PMID: 15371548]

Grants

  1. F32 CA243290/NCI NIH HHS

Word Cloud

Created with Highcharts 10.0.0screeninghitvirtualidentificationplatformmachinecompoundsVSdrugdiscoverymoleculesearly-stageproteinTAME-VStargetlearninglargeexperimentalefficienttargetsfacilitatemethodologyperformvalidatedactivityHigh-throughputHTSmethodsenableempiricalevaluationscalecanaugmentedtechniquessavetimemoneyusingpotentialactivetestingStructure-basedligand-basedapproachesextensivelystudiedappliedpracticeprovenoutcomesadvancingcandidateHoweverdatarequiredexpensiveeffectivemannerparticularlychallengingnovelHereinpresentTArget-drivenMachinelearning-Enabledleveragesexistingchemicaldatabasesbioactivemodularlyfindingenablesbespokecampaignsuser-definedinputIDusedhomology-basedexpansionfollowedcompoundretrievalcompilationexperimentallyCompoundssubsequentlyvectorizedadoptedMLmodeltrainingmodelsdeployedmodel-basedinferentialnominatedbasedpredictedretrospectivelyacrosstendiversedemonstratedclearpredictivepowerimplementedprovidesflexibleapproachaccessiblewiderangeuserspubliclyavailablehttps://githubcom/bymgood/Target-driven-ML-enabled-VSTarget-drivenlearning-enabledAIDD

Similar Articles

Cited By