SOTAVerified

Benchmarking Deep Reinforcement Learning for Continuous Control

2016-04-22Code Available2· sign in to hype

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
2D WalkerTRPOScore1,353.8Unverified
AcrobotTRPOScore-326Unverified
Acrobot (limited sensors)TRPOScore-83.3Unverified
Acrobot (noisy observations)TRPOScore-149.6Unverified
Acrobot (system identifications)TRPOScore-170.9Unverified
AntTRPOScore730.2Unverified
Ant + GatheringTRPOScore-0.4Unverified
Ant + MazeTRPOScore0Unverified
Cart-Pole BalancingTRPOScore4,869.8Unverified
Cart-Pole Balancing (limited sensors)TRPOScore960.2Unverified
Cart-Pole Balancing (noisy observations)TRPOScore606.2Unverified
Cart-Pole Balancing (system identifications)TRPOScore980.3Unverified
Double Inverted PendulumTRPOScore4,412.4Unverified
Full HumanoidTRPOScore287Unverified
Half-CheetahTRPOScore1,914Unverified
HopperTRPOScore1,183.3Unverified
Inverted PendulumTRPOScore247.2Unverified
Inverted Pendulum (limited sensors)TRPOScore4.5Unverified
Inverted Pendulum (noisy observations)TRPOScore10.4Unverified
Inverted Pendulum (system identifications)TRPOScore14.1Unverified
Mountain CarTRPOScore-61.7Unverified
Mountain Car (limited sensors)TRPOScore-64.2Unverified
Mountain Car (noisy observations)TRPOScore-60.2Unverified
Mountain Car (system identifications)TRPOScore-61.6Unverified
Simple HumanoidTRPOScore269.7Unverified
SwimmerTRPOScore96Unverified
Swimmer + GatheringTRPOScore0Unverified
Swimmer + MazeTRPOScore0Unverified

Reproductions