Unifying Count-Based Exploration and Intrinsic Motivation

2016-06-06NeurIPS 2016Code Available0· sign in to hype

Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos

Code Available — Be the first to reproduce this paper.

Code

github.com/RLAgent/state-marginal-matching
pytorch★ 0

Abstract

We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge.

Tasks

Atari Games Montezuma's Revenge reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Atari 2600 Freeway	A3C-CTS	Score	30.48	—	Unverified
Atari 2600 Gravitar	A3C-CTS	Score	238.68	—	Unverified
Atari 2600 Montezuma's Revenge	DDQN-PC	Score	3,459	—	Unverified
Atari 2600 Montezuma's Revenge	A3C-CTS	Score	273.7	—	Unverified
Atari 2600 Private Eye	A3C-CTS	Score	99.32	—	Unverified
Atari 2600 Venture	A3C-CTS	Score	0	—	Unverified

Unifying Count-Based Exploration and Intrinsic Motivation

Code

Abstract

Tasks

Benchmark Results

Reproductions