Muesli: Combining Improvements in Policy Optimization

2021-04-13Code Available1· sign in to hype

Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent SIfre, Theophane Weber, David Silver, Hado van Hasselt

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/Itomigna2/Muesli-lunarlander
pytorch★ 19
github.com/YuriCat/MuesliJupyterExample
none★ 18

Abstract

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Tasks

Atari Games continuous-control Continuous Control

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
atari game	Muesli	Human World Record Breakthrough	5	—	Unverified

Muesli: Combining Improvements in Policy Optimization

Code

Abstract

Tasks

Benchmark Results

Reproductions