Multisource Transfer Double DQN Based on Actor Learning.

Jie Pan, Xuesong Wang, Yuhu Cheng, Qiang Yu, Qiang Yu, Yuhu Cheng, Jie Pan, Xuesong Wang
Author Information

Abstract

Deep reinforcement learning (RL) comprehensively uses the psychological mechanisms of "trial and error" and "reward and punishment" in RL as well as powerful feature expression and nonlinear mapping in deep learning. Currently, it plays an essential role in the fields of artificial intelligence and machine learning. Since an RL agent needs to constantly interact with its surroundings, the deep Q network (DQN) is inevitably faced with the need to learn numerous network parameters, which results in low learning efficiency. In this paper, a multisource transfer double DQN (MTDDQN) based on actor learning is proposed. The transfer learning technique is integrated with deep RL to make the RL agent collect, summarize, and transfer action knowledge, including policy mimic and feature regression, to the training of related tasks. There exists action overestimation in DQN, i.e., the lower probability limit of action corresponding to the maximum Q value is nonzero. Therefore, the transfer network is trained by using double DQN to eliminate the error accumulation caused by action overestimation. In addition, to avoid negative transfer, i.e., to ensure strong correlations between source and target tasks, a multisource transfer learning mechanism is applied. The Atari2600 game is tested on the arcade learning environment platform to evaluate the feasibility and performance of MTDDQN by comparing it with some mainstream approaches, such as DQN and double DQN. Experiments prove that MTDDQN achieves not only human-like actor learning transfer capability, but also the desired learning efficiency and testing accuracy on target task.

MeSH Term

Algorithms
Computer Simulation
Humans
Machine Learning
Neural Networks, Computer
Reinforcement, Psychology
Transfer, Psychology

Word Cloud

Created with Highcharts 10.0.0learningDQNtransferRLactiondeepnetworkdoubleMTDDQNfeatureagentQefficiencymultisourceactortasksoverestimationietargetDeepreinforcementcomprehensivelyusespsychologicalmechanisms"trialerror""rewardpunishment"wellpowerfulexpressionnonlinearmappingCurrentlyplaysessentialrolefieldsartificialintelligencemachineSinceneedsconstantlyinteractsurroundingsinevitablyfacedneedlearnnumerousparametersresultslowpaperbasedproposedtechniqueintegratedmakecollectsummarizeknowledgeincludingpolicymimicregressiontrainingrelatedexistslowerprobabilitylimitcorrespondingmaximumvaluenonzeroThereforetrainedusingeliminateerroraccumulationcausedadditionavoidnegativeensurestrongcorrelationssourcemechanismappliedAtari2600gametestedarcadeenvironmentplatformevaluatefeasibilityperformancecomparingmainstreamapproachesExperimentsproveachieveshuman-likecapabilityalsodesiredtestingaccuracytaskMultisourceTransferDoubleBasedActorLearning

Similar Articles

Cited By

No available data.