3D graph contrastive learning for molecular property prediction.

Kisung Moon, Hyeon-Jin Im, Sunyoung Kwon
Author Information
  1. Kisung Moon: Department of Information Convergence Engineering, Pusan National University, Yangsan 50612, Korea. ORCID
  2. Hyeon-Jin Im: Department of Information Convergence Engineering, Pusan National University, Yangsan 50612, Korea. ORCID
  3. Sunyoung Kwon: Department of Information Convergence Engineering, Pusan National University, Yangsan 50612, Korea. ORCID

Abstract

MOTIVATION: Self-supervised learning (SSL) is a method that learns the data representation by utilizing supervision inherent in the data. This learning method is in the spotlight in the drug field, lacking annotated data due to time-consuming and expensive experiments. SSL using enormous unlabeled data has shown excellent performance for molecular property prediction, but a few issues exist. (i) Existing SSL models are large-scale; there is a limitation to implementing SSL where the computing resource is insufficient. (ii) In most cases, they do not utilize 3D structural information for molecular representation learning. The activity of a drug is closely related to the structure of the drug molecule. Nevertheless, most current models do not use 3D information or use it partially. (iii) Previous models that apply contrastive learning to molecules use the augmentation of permuting atoms and bonds. Therefore, molecules having different characteristics can be in the same positive samples. We propose a novel contrastive learning framework, small-scale 3D Graph Contrastive Learning (3DGCL) for molecular property prediction, to solve the above problems.
RESULTS: 3DGCL learns the molecular representation by reflecting the molecule's structure through the pretraining process that does not change the semantics of the drug. Using only 1128 samples for pretrain data and 0.5 million model parameters, we achieved state-of-the-art or comparable performance in six benchmark datasets. Extensive experiments demonstrate that 3D structural information based on chemical knowledge is essential to molecular representation learning for property prediction.
AVAILABILITY AND IMPLEMENTATION: Data and codes are available in https://github.com/moonkisung/3DGCL.

References

  1. J Chem Theory Comput. 2019 Jun 11;15(6):3678-3693 [PMID: 31042390]
  2. KDD. 2021 Aug;2021:3585-3594 [PMID: 35571558]
  3. J Comput Aided Mol Des. 2014 Jul;28(7):711-20 [PMID: 24928188]
  4. Chem Sci. 2017 Oct 31;9(2):513-530 [PMID: 29629118]
  5. J Chem Inf Model. 2019 Aug 26;59(8):3370-3388 [PMID: 31361484]
  6. J Am Chem Soc. 2009 Jul 1;131(25):8732-3 [PMID: 19505099]
  7. J Chem Phys. 2020 Sep 28;153(12):124111 [PMID: 33003742]
  8. J Med Chem. 1996 Jul 19;39(15):2887-93 [PMID: 8709122]
  9. J Chem Inf Comput Sci. 2004 May-Jun;44(3):1000-5 [PMID: 15154768]
  10. J Chem Inf Model. 2010 May 24;50(5):742-54 [PMID: 20426451]
  11. J Chem Phys. 2015 Aug 28;143(8):084111 [PMID: 26328822]
  12. J Med Chem. 2020 Aug 27;63(16):8749-8760 [PMID: 31408336]
  13. Brief Bioinform. 2021 Nov 5;22(6): [PMID: 33940598]
  14. Bioinformatics. 2022 Mar 28;38(7):2003-2009 [PMID: 35094072]

MeSH Term

Benchmarking
Semantics
Molecular Conformation

Word Cloud

Created with Highcharts 10.0.0learningmoleculardata3DSSLrepresentationdrugpropertypredictionmodelsinformationusecontrastivemethodlearnsexperimentsperformancestructuralstructuremoleculessamples3DGCLMOTIVATION:Self-supervisedutilizingsupervisioninherentspotlightfieldlackingannotatedduetime-consumingexpensiveusingenormousunlabeledshownexcellentissuesexistExistinglarge-scalelimitationimplementingcomputingresourceinsufficientiicasesutilizeactivitycloselyrelatedmoleculeNeverthelesscurrentpartiallyiiiPreviousapplyaugmentationpermutingatomsbondsThereforedifferentcharacteristicscanpositiveproposenovelframeworksmall-scaleGraphContrastiveLearningsolveproblemsRESULTS:reflectingmolecule'spretrainingprocesschangesemanticsUsing1128pretrain05millionmodelparametersachievedstate-of-the-artcomparablesixbenchmarkdatasetsExtensivedemonstratebasedchemicalknowledgeessentialAVAILABILITYANDIMPLEMENTATION:Datacodesavailablehttps://githubcom/moonkisung/3DGCLgraph

Similar Articles

Cited By (3)