Addressing Function Approximation Error in Actor-Critic Methods

2018-02-26ICML 2018Code Available1· sign in to hype

Scott Fujimoto, Herke van Hoof, David Meger

Code Available — Be the first to reproduce this paper.

Code

github.com/sfujim/TD3
OfficialIn paperpytorch★ 0
github.com/Rafael1s/Deep-Reinforcement-Learning-Udacity
pytorch★ 992
github.com/intelligent-environments-lab/CityLearn
tf★ 594
github.com/arrival-ltd/catalyst-rl-tutorial
pytorch★ 202
github.com/fiorenza2/OffCon3
pytorch★ 25
github.com/core-robotics-lab/icct
pytorch★ 18
github.com/yydsok/oparl
pytorch★ 18
github.com/baturaysaglam/la3p
pytorch★ 18
github.com/seungju-k1m/sac-td3-td7
pytorch★ 15
github.com/GhadaSokar/Dynamic-Sparse-Training-for-Deep-Reinforcement-Learning
pytorch★ 15

Abstract

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

Tasks

Continuous Control OpenAI Gym Q-Learning reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Ant-v4	TD3	Average Return	5,942.55	—	Unverified
HalfCheetah-v4	TD3	Average Return	12,026.73	—	Unverified
Hopper-v4	TD3	Average Return	3,319.98	—	Unverified
Humanoid-v4	TD3	Average Return	198.44	—	Unverified
Walker2d-v4	TD3	Average Return	2,612.74	—	Unverified

Addressing Function Approximation Error in Actor-Critic Methods

Code

Abstract

Tasks

Benchmark Results

Reproductions