SOTAVerified

Smooth Exploration for Robotic Reinforcement Learning

2020-05-12Code Available2· sign in to hype

Antonin Raffin, Jens Kober, Freek Stulp

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
PyBullet AntA2CReturn1,967Unverified
PyBullet AntSAC gSDEReturn3,459Unverified
PyBullet AntTD3 gSDEReturn3,267Unverified
PyBullet AntTD3Return2,865Unverified
PyBullet AntSACReturn2,859Unverified
PyBullet AntPPO gSDEReturn2,587Unverified
PyBullet AntA2C gSDEReturn2,560Unverified
PyBullet AntPPOReturn2,160Unverified
PyBullet HalfCheetahPPOReturn2,254Unverified
PyBullet HalfCheetahSACReturn2,883Unverified
PyBullet HalfCheetahSAC gSDEReturn2,850Unverified
PyBullet HalfCheetahPPO + gSDEReturn2,760Unverified
PyBullet HalfCheetahTD3Return2,687Unverified
PyBullet HalfCheetahTD3 gSDEReturn2,578Unverified
PyBullet HalfCheetahA2C + gSDEReturn2,028Unverified
PyBullet HalfCheetahA2CReturn1,652Unverified
PyBullet HopperA2C gSDEReturn1,448Unverified
PyBullet HopperSAC gSDEReturn2,646Unverified
PyBullet HopperPPO gSDEReturn2,508Unverified
PyBullet HopperSACReturn2,477Unverified
PyBullet HopperTD3Return2,470Unverified
PyBullet HopperTD3 gSDEReturn2,353Unverified
PyBullet HopperPPOReturn1,622Unverified
PyBullet HopperA2CReturn1,559Unverified
PyBullet Walker2DSAC gSDEReturn2,341Unverified
PyBullet Walker2DSACReturn2,215Unverified
PyBullet Walker2DTD3Return2,106Unverified
PyBullet Walker2DTD3 gSDEReturn1,989Unverified
PyBullet Walker2DPPO gSDEReturn1,776Unverified
PyBullet Walker2DPPOReturn1,238Unverified
PyBullet Walker2DA2C gSDEReturn694Unverified
PyBullet Walker2DA2CReturn443Unverified

Reproductions