Benchmarking Deep Reinforcement Learning for Continuous Control

2016-04-22Code Available2· sign in to hype

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel

Code Available — Be the first to reproduce this paper.

Code

github.com/rllab/rllab
OfficialIn papertf★ 0
github.com/rll/rllab
tf★ 3,050
github.com/russellmendonca/maesn_suite
tf★ 44
github.com/sisl/event-driven-rllab
tf★ 0
github.com/sisl/gail-driver
none★ 0
github.com/rlworkgroup/garage
tf★ 0
github.com/wyndwarrior/imitation_from_observation
tf★ 0
github.com/Dam930/rllab
tf★ 0
github.com/richardrl/cartpole-request-for-research
pytorch★ 0
github.com/openai/rllab
tf★ 0

Abstract

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.

Tasks

Action Triplet Recognition Atari Games Benchmarking continuous-control Continuous Control Deep Reinforcement Learning reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
2D Walker	TRPO	Score	1,353.8	—	Unverified
Acrobot	TRPO	Score	-326	—	Unverified
Acrobot (limited sensors)	TRPO	Score	-83.3	—	Unverified
Acrobot (noisy observations)	TRPO	Score	-149.6	—	Unverified
Acrobot (system identifications)	TRPO	Score	-170.9	—	Unverified
Ant	TRPO	Score	730.2	—	Unverified
Ant + Gathering	TRPO	Score	-0.4	—	Unverified
Ant + Maze	TRPO	Score	0	—	Unverified
Cart-Pole Balancing	TRPO	Score	4,869.8	—	Unverified
Cart-Pole Balancing (limited sensors)	TRPO	Score	960.2	—	Unverified
Cart-Pole Balancing (noisy observations)	TRPO	Score	606.2	—	Unverified
Cart-Pole Balancing (system identifications)	TRPO	Score	980.3	—	Unverified
Double Inverted Pendulum	TRPO	Score	4,412.4	—	Unverified
Full Humanoid	TRPO	Score	287	—	Unverified
Half-Cheetah	TRPO	Score	1,914	—	Unverified
Hopper	TRPO	Score	1,183.3	—	Unverified
Inverted Pendulum	TRPO	Score	247.2	—	Unverified
Inverted Pendulum (limited sensors)	TRPO	Score	4.5	—	Unverified
Inverted Pendulum (noisy observations)	TRPO	Score	10.4	—	Unverified
Inverted Pendulum (system identifications)	TRPO	Score	14.1	—	Unverified
Mountain Car	TRPO	Score	-61.7	—	Unverified
Mountain Car (limited sensors)	TRPO	Score	-64.2	—	Unverified
Mountain Car (noisy observations)	TRPO	Score	-60.2	—	Unverified
Mountain Car (system identifications)	TRPO	Score	-61.6	—	Unverified
Simple Humanoid	TRPO	Score	269.7	—	Unverified
Swimmer	TRPO	Score	96	—	Unverified
Swimmer + Gathering	TRPO	Score	0	—	Unverified
Swimmer + Maze	TRPO	Score	0	—	Unverified

Benchmarking Deep Reinforcement Learning for Continuous Control

Code

Abstract

Tasks

Benchmark Results

Reproductions