GPCRSPACE: A New GPCR Real Expanded Library Based on Large Language Models Architecture and Positive Sample Machine Learning Strategies.

Shiming Chen, Feisheng Zhong
Author Information
  1. Shiming Chen: Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China.
  2. Feisheng Zhong: Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China. ORCID

Abstract

The quest for novel therapeutics targeting G protein-coupled receptors (GPCRs), essential in numerous physiological processes, is crucial in drug discovery. Despite the abundance of GPCR-targeting drugs, many receptors lack selective modulators, indicating a significant untapped therapeutic potential. To bridge this gap, we introduce GPCRSPACE, a novel GPCR-focused purchasable real chemical library developed using the G protein-coupled receptors large language models (GPCR LLM) architecture. Different from traditional machine learning models, GPCR LLM uses a positive sample machine learning strategy for training and does not need to construct any negative samples. This not only reduces false negatives but also reduces the time to label negative samples. GPCR LLM accelerates the identification and screening of potential GPCR-interactive compounds by learning the chemical space of GPCR-targeting molecules. GPCRSPACE, built on GPCR LLM, outperforms existing chemical data sets in synthesizability, structural diversity, and GPCR-likeness, making it a valuable tool for GPCR drug discovery.

MeSH Term

Receptors, G-Protein-Coupled
Machine Learning
Small Molecule Libraries
Drug Discovery
Humans

Chemicals

Receptors, G-Protein-Coupled
Small Molecule Libraries

Word Cloud

Created with Highcharts 10.0.0GPCRLLMreceptorschemicallearningnovelGprotein-coupleddrugdiscoveryGPCR-targetingpotentialGPCRSPACEmodelsmachinenegativesamplesreducesquesttherapeuticstargetingGPCRsessentialnumerousphysiologicalprocessescrucialDespiteabundancedrugsmanylackselectivemodulatorsindicatingsignificantuntappedtherapeuticbridgegapintroduceGPCR-focusedpurchasablereallibrarydevelopedusinglargelanguagearchitectureDifferenttraditionalusespositivesamplestrategytrainingneedconstructfalsenegativesalsotimelabelacceleratesidentificationscreeningGPCR-interactivecompoundsspacemoleculesbuiltoutperformsexistingdatasetssynthesizabilitystructuraldiversityGPCR-likenessmakingvaluabletoolGPCRSPACE:NewRealExpandedLibraryBasedLargeLanguageModelsArchitecturePositiveSampleMachineLearningStrategies

Similar Articles

Cited By