A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics.

Laura Isigkeit, Apirat Chaikuad, Daniel Merk
Author Information
  1. Laura Isigkeit: Institute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, Germany. ORCID
  2. Apirat Chaikuad: Institute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, Germany. ORCID
  3. Daniel Merk: Institute of Pharmaceutical Chemistry, Goethe University Frankfurt, 60438 Frankfurt, Germany.

Abstract

Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.

Keywords

References

  1. Toxicol Appl Pharmacol. 2020 Nov 15;407:115244 [PMID: 32961130]
  2. Metabolites. 2021 Jun 30;11(7): [PMID: 34209357]
  3. RSC Med Chem. 2021 Jul 16;12(8):1428-1441 [PMID: 34447939]
  4. Expert Opin Drug Discov. 2011 Jul;6(7):683-7 [PMID: 22650976]
  5. Mol Inform. 2010 Jul 12;29(6-7):476-88 [PMID: 27463326]
  6. Angew Chem Int Ed Engl. 2021 Aug 23;60(35):19477-19482 [PMID: 34165856]
  7. J Med Chem. 1996 Jul 19;39(15):2887-93 [PMID: 8709122]
  8. Nucleic Acids Res. 2021 Jan 8;49(D1):D939-D946 [PMID: 33152070]
  9. J Chem Inf Model. 2010 May 24;50(5):742-54 [PMID: 20426451]
  10. Drug Discov Today. 2018 Jul;23(7):1373-1384 [PMID: 29577971]
  11. Mol Inform. 2018 Jan;37(1-2): [PMID: 29319225]
  12. J Chem Inf Model. 2010 Jul 26;50(7):1189-204 [PMID: 20572635]
  13. Nucleic Acids Res. 2021 Jan 8;49(D1):D1388-D1395 [PMID: 33151290]
  14. Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940 [PMID: 30398643]
  15. Nucleic Acids Res. 2016 Jan 4;44(D1):D1045-53 [PMID: 26481362]
  16. Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082 [PMID: 29126136]
  17. Nucleic Acids Res. 2022 Jan 7;50(D1):D1282-D1294 [PMID: 34718737]

Grants

  1. 875510/Innovative Medicines Initiative

MeSH Term

Consensus
Databases, Factual
Drug Design
Humans

Word Cloud

Created with Highcharts 10.0.0bioactivitydatacompounddesignapplicationsdrugcoverageconsensusdatasettargetscurationconfidence1millionPubliclyavailabledatabasesprovideessentialbasisdata-drivenlife-scienceresearchanalyzingseveralrepositoriesdiscovereddifferencestargetadvocatingcombinedusemultiplesourcesUsingChEMBLPubChemIUPHAR/BPSBindingDBProbes&Drugsassembledfocusingsmallmoleculeshumanmacromolecularallowedimprovedspaceautomatedcomparisonstructuralrevealpotentiallyerroneousentriesincreasecomprisedcompounds109pointsannotationsassaytypeprovidingusefulensemblecomputationalchemogenomicsConsensusCompound/BioactivityDatasetData-DrivenDrugDesignChemogenomicsbigdenovomachinelearningmedicinalchemistry

Similar Articles

Cited By