Error Controlled Actor-Critic Method to Reinforcement Learning
Xingen Gao, Fei Chao, Changle Zhou, Zhen Ge, Chih-Min Lin, Longzhi Yang, Xiang Chang, Changjing Shang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In the reinforcement learning (RL) algorithms which incorporate function approximation methods, the approximation error of value function inevitably cause overestimation phenomenon and have a negative impact on the convergence of the algorithms. To mitigate the negative effects of approximation error, we propose a new actor-critic algorithm called Error Controlled Actor-critic which ensures confining the approximation error in value function. In this paper, we firstly present an analysis of how the approximation error can hinder the optimization process of actor-critic methods. Then, we derive an upper boundary of the approximation error of Q function approximator, and found that the error can be lowered by placing restrictions on the KL-divergence between every two consecutive policies during the training phase of the policy. The results of experiments on a range of continuous control tasks from OpenAI gym suite demonstrate that the proposed actor-critic algorithm apparently reduces the approximation error and significantly outperforms other model-free RL algorithms.