CasANGCL: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction.

Zixi Zheng, Yanyan Tan, Hong Wang, Shengpeng Yu, Tianyu Liu, Cheng Liang
Author Information
  1. Zixi Zheng: School of Information Science and Engineering, Shandong Normal University,Jinan 250358, China.
  2. Yanyan Tan: School of Information Science and Engineering, Shandong Normal University,Jinan 250358, China.
  3. Hong Wang: School of Information Science and Engineering, Shandong Normal University,Jinan 250358, China.
  4. Shengpeng Yu: School of Information Science and Engineering, Shandong Normal University,Jinan 250358, China.
  5. Tianyu Liu: School of Information Science and Engineering, Shandong Normal University,Jinan 250358, China.
  6. Cheng Liang: School of Information Science and Engineering, Shandong Normal University,Jinan 250358, China.

Abstract

MOTIVATION: Molecular property prediction is a significant requirement in AI-driven drug design and discovery, aiming to predict the molecular property information (e.g. toxicity) based on the mined biomolecular knowledge. Although graph neural networks have been proven powerful in predicting molecular property, unbalanced labeled data and poor generalization capability for new-synthesized molecules are always key issues that hinder further improvement of molecular encoding performance.
RESULTS: We propose a novel self-supervised representation learning scheme based on a Cascaded Attention Network and Graph Contrastive Learning (CasANGCL). We design a new graph network variant, designated as cascaded attention network, to encode local-global molecular representations. We construct a two-stage contrast predictor framework to tackle the label imbalance problem of training molecular samples, which is an integrated end-to-end learning scheme. Moreover, we utilize the information-flow scheme for training our network, which explicitly captures the edge information in the node/graph representations and obtains more fine-grained knowledge. Our model achieves an 81.9% ROC-AUC average performance on 661 tasks from seven challenging benchmarks, showing better portability and generalizations. Further visualization studies indicate our model's better representation capacity and provide interpretability.

Keywords

Grants

  1. 61672329/National Science Foundation of China
  2. SDYY18058/Shandong Provincial Project of Education Scientific Plan

MeSH Term

Learning
Drug Design
Neural Networks, Computer

Word Cloud

Created with Highcharts 10.0.0molecularpropertylearningnetworkgraphpredictionbasedrepresentationschemecascadedattentiondesigninformationknowledgeperformanceself-supervisedrepresentationstrainingmodelbettercontrastiveMOTIVATION:MolecularsignificantrequirementAI-drivendrugdiscoveryaimingpredictegtoxicityminedbiomolecularAlthoughneuralnetworksprovenpowerfulpredictingunbalancedlabeleddatapoorgeneralizationcapabilitynew-synthesizedmoleculesalwayskeyissueshinderimprovementencodingRESULTS:proposenovelCascadedAttentionNetworkGraphContrastiveLearningCasANGCLnewvariantdesignatedencodelocal-globalconstructtwo-stagecontrastpredictorframeworktacklelabelimbalanceproblemsamplesintegratedend-to-endMoreoverutilizeinformation-flowexplicitlycapturesedgenode/graphobtainsfine-grainedachieves819%ROC-AUCaverage661taskssevenchallengingbenchmarksshowingportabilitygeneralizationsvisualizationstudiesindicatemodel'scapacityprovideinterpretabilityCasANGCL:pre-trainingfine-tuning

Similar Articles

Cited By