Machine Learning Models for Activity: Prediction and Target Visualization.
Thomas R Lane, Fabio Urbina, Laura Rank, Jacob Gerlach, Olga Riabova, Alexander Lepioshkin, Elena Kazakova, Anthony Vocat, Valery Tkachenko, Stewart Cole, Vadim Makarov, Sean Ekins
Author Information
Thomas R Lane: Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Fabio Urbina: Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Laura Rank: Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Jacob Gerlach: Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Olga Riabova: Research Center of Biotechnology RAS, Moscow 119071, Russia.
Alexander Lepioshkin: Research Center of Biotechnology RAS, Moscow 119071, Russia.
Elena Kazakova: Research Center of Biotechnology RAS, Moscow 119071, Russia.
Anthony Vocat: Global Health Institute, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland.
Valery Tkachenko: Science Data Experts, 14909 Forest Landing Cir, Rockville, Maryland 20850, United States.
Stewart Cole: Institut Pasteur, Paris 75015, France.
Vadim Makarov: Research Center of Biotechnology RAS, Moscow 119071, Russia.
Sean Ekins: Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States. ORCID
Tuberculosis (TB) is a major global health challenge, with approximately 1.4 million deaths per year. There is still a need to develop novel treatments for patients infected with (). There have been many large-scale phenotypic screens that have led to the identification of thousands of new compounds. Yet, there is very limited investment in TB drug discovery which points to the need for new methods to increase the efficiency of drug discovery against . We have used machine learning approaches to learn from the public data, resulting in many data sets and models with robust enrichment and hit rates leading to the discovery of new active compounds. Recently, we have curated predominantly small-molecule data and developed new machine learning classification models with 18 886 molecules at different activity cutoffs. We now describe the further validation of these Bayesian models using a library of over 1000 molecules synthesized as part of EU-funded New Medicines for TB and More Medicines for TB programs. We highlight molecular features which are enriched in these active compounds. In addition, we provide new regression and classification models that can be used for scoring compound libraries or used to design new molecules. We have also visualized these molecules in the context of known molecular targets and identified clusters in chemical property space, which may aid in future target identification efforts. Finally, we are also making these data sets publicly available, representing a significant increase to the available inhibition data in the public domain.