Exploring chemical space for lead identification by propagating on chemical similarity network.

Jungseob Yi, Sangseon Lee, Sangsoo Lim, Changyun Cho, Yinhua Piao, Marie Yeo, Dongkyu Kim, Sun Kim, Sunho Lee
Author Information
  1. Jungseob Yi: Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea.
  2. Sangseon Lee: Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea.
  3. Sangsoo Lim: School of AI Software Convergence, Dongguk University, Pildong-ro 1-gil, Jung-gu, Seoul, South Korea.
  4. Changyun Cho: Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea.
  5. Yinhua Piao: Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea.
  6. Marie Yeo: PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea.
  7. Dongkyu Kim: PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea.
  8. Sun Kim: Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea.
  9. Sunho Lee: AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea.

Abstract

Motivation: Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction. Alternatively, data mining approaches are also used to explore chemical space with drug candidates by screening undesirable compounds. A major challenge for data mining approaches is to develop efficient data mining methods that search large chemical space for desirable lead compounds with low false positive rate.
Results: In this work, we developed a network propagation (NP) based data mining method for lead identification that performs search on an ensemble of chemical similarity networks. We compiled 14 fingerprint-based similarity networks. Given a target protein of interest, we use a deep learning-based drug target interaction model to narrow down compound candidates and then we use network propagation to prioritize drug candidates that are highly correlated with drug activity score such as IC. In an extensive experiment with BindingDB, we showed that our approach successfully discovered intentionally unlabeled compounds for given targets. To further demonstrate the prediction power of our approach, we identified 24 candidate leads for CLK1. Two out of five synthesizable candidates were experimentally validated in binding assays. In conclusion, our framework can be very useful for lead identification from very large compound databases such as ZINC.

Keywords

References

  1. Bioinformatics. 2018 Nov 1;34(21):3666-3674 [PMID: 29757353]
  2. Annu Rev Pharmacol Toxicol. 2020 Jan 6;60:573-589 [PMID: 31518513]
  3. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34498670]
  4. J Pharmacol Toxicol Methods. 2018 Sep - Oct;99:106604 [PMID: 31254621]
  5. J Chem Inf Model. 2010 Jun 28;50(6):1034-41 [PMID: 20578727]
  6. Nat Commun. 2022 Mar 4;13(1):1186 [PMID: 35246540]
  7. J Chem Inf Model. 2005 Jan-Feb;45(1):177-82 [PMID: 15667143]
  8. Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18477-18488 [PMID: 32669436]
  9. J Chem Inf Model. 2022 May 9;62(9):2186-2201 [PMID: 34723537]
  10. J Chem Inf Model. 2013 Mar 25;53(3):692-703 [PMID: 23461561]
  11. Science. 2017 Dec 1;358(6367): [PMID: 29191878]
  12. Comput Struct Biotechnol J. 2022 Aug 05;20:4288-4304 [PMID: 36051875]
  13. Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13 [PMID: 26400175]
  14. Bioinformatics. 2018 Sep 1;34(17):i821-i829 [PMID: 30423097]
  15. Brief Bioinform. 2019 Nov 27;20(6):2028-2043 [PMID: 30099485]
  16. J Chem Inf Model. 2020 Dec 28;60(12):6065-6073 [PMID: 33118813]
  17. Nat Rev Drug Discov. 2019 Jun;18(6):463-477 [PMID: 30976107]
  18. PLoS Comput Biol. 2019 Jun 14;15(6):e1007129 [PMID: 31199797]
  19. Drug Discov Today. 2020 Sep;25(9):1624-1638 [PMID: 32663517]
  20. Nature. 2020 Sep;585(7824):293-297 [PMID: 32494016]
  21. Bioinformatics. 2021 Apr 1;36(22-23):5545-5547 [PMID: 33275143]
  22. J Chem Inf Model. 2019 Aug 26;59(8):3370-3388 [PMID: 31361484]
  23. J Am Chem Soc. 2009 Jul 1;131(25):8732-3 [PMID: 19505099]
  24. Mol Inform. 2010 Jul 12;29(6-7):476-88 [PMID: 27463326]
  25. Curr Protein Pept Sci. 2007 Aug;8(4):329-51 [PMID: 17696867]
  26. Curr Opin Chem Biol. 2004 Jun;8(3):264-70 [PMID: 15183324]
  27. Curr Med Chem. 2010;17(34):4231-55 [PMID: 20939815]
  28. J Chem Inf Model. 2018 Jan 22;58(1):27-35 [PMID: 29268609]
  29. Bioinformatics. 2019 Jan 15;35(2):309-318 [PMID: 29982330]
  30. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34471921]
  31. Nucleic Acids Res. 2016 Jan 4;44(D1):D1045-53 [PMID: 26481362]
  32. J Med Chem. 2015 Mar 12;58(5):2091-113 [PMID: 25634295]
  33. Bioinformatics. 2021 May 5;37(5):693-704 [PMID: 33067636]
  34. Curr Top Med Chem. 2012;12(18):1965-79 [PMID: 23110532]
  35. J Pharmacol Toxicol Methods. 2018 Sep - Oct;99:106609 [PMID: 31284073]
  36. Expert Opin Drug Discov. 2022 May;17(5):423-425 [PMID: 35255749]
  37. Sci Adv. 2021 Jan 1;7(1): [PMID: 33187978]
  38. J Chem Inf Model. 2008 May;48(5):941-8 [PMID: 18416545]
  39. Mol Inform. 2022 Nov;41(11):e2200116 [PMID: 35916110]
  40. Br J Pharmacol. 2011 Mar;162(6):1239-49 [PMID: 21091654]
  41. Nucleic Acids Res. 2017 Jan 4;45(D1):D945-D954 [PMID: 27899562]
  42. Cell. 2020 Feb 20;180(4):688-702.e13 [PMID: 32084340]
  43. J Mol Model. 2021 Feb 4;27(3):71 [PMID: 33543405]
  44. Drug Discov Today. 2022 Aug;27(8):2353-2362 [PMID: 35561964]
  45. J Comput Aided Mol Des. 2013 Aug;27(8):675-9 [PMID: 23963658]
  46. Nat Rev Drug Discov. 2020 May;19(5):353-364 [PMID: 31801986]
  47. Brief Bioinform. 2021 Jul 20;22(4): [PMID: 33152756]
  48. ChemMedChem. 2007 Sep;2(9):1311-20 [PMID: 17562536]
  49. Nat Rev Genet. 2017 Sep;18(9):551-562 [PMID: 28607512]
  50. Anal Chem. 2020 Jul 7;92(13):8649-8653 [PMID: 32584545]
  51. Bioinformatics. 2015 Jun 1;31(11):1788-95 [PMID: 25638810]
  52. J Chem Inf Model. 2013 Aug 26;53(8):2154-60 [PMID: 23889502]
  53. Comput Struct Biotechnol J. 2021 Mar 10;19:1541-1556 [PMID: 33841755]
  54. Nat Biotechnol. 2019 Sep;37(9):1038-1040 [PMID: 31477924]
  55. F1000Res. 2016 Apr 06;5: [PMID: 27127620]

Word Cloud

Created with Highcharts 10.0.0chemicalidentificationcompoundsdrugleadminingsimilaritydatacandidatesnetworkMLDLapproachescompoundusespacepropagationLeadprioritizecandidatelearningdeepusedinformationmethodsmodelconstructionsearchlargenetworkstargetapproachMotivation:fundamentalstepdownstreamdiscoveryprocessMachinewidelyidentifyusingpropertyexperimentalHoweverrarelyconsiderdirectlysincemodelsabstractrepresentationmoleculesAlternativelyalsoexplorescreeningundesirablemajorchallengedevelopefficientdesirablelowfalsepositiverateResults:workdevelopedNPbasedmethodperformsensemblecompiled14fingerprint-basedGivenproteininterestlearning-basedinteractionnarrowhighlycorrelatedactivityscoreICextensiveexperimentBindingDBshowedsuccessfullydiscoveredintentionallyunlabeledgiventargetsdemonstratepredictionpoweridentified24leadsCLK1TwofivesynthesizableexperimentallyvalidatedbindingassaysconclusionframeworkcanusefuldatabasesZINCExploringpropagating00001111ChemicalDataNetwork

Similar Articles

Cited By (1)