SOTAVerified

Understanding the Asymptotic Performance of Model-Based RL Methods

2018-09-27Unverified0· sign in to hype

William Whitney, Rob Fergus

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In complex simulated environments, model-based reinforcement learning methods typically lag the asymptotic performance of model-free approaches. This paper uses two MuJoCo environments to understand this gap through a series of ablation experiments designed to separate the contributions of the dynamics model and planner. These reveal the importance of long planning horizons, beyond those typically used. A dynamics model that directly predicts distant states, based on current state and a long sequence of actions, is introduced. This avoids the need for many recursions during long-range planning, and thus is able to yield more accurate state estimates. These accurate predictions allow us to uncover the relationship between model accuracy and performance, and translate to higher task reward that matches or exceeds current state-of-the-art model-free approaches.

Tasks

Reproductions