Count-Based Exploration with the Successor Representation

2018-07-31ICLR 2019Code Available0· sign in to hype

Marlos C. Machado, Marc G. Bellemare, Michael Bowling

Code Available — Be the first to reproduce this paper.

Code

github.com/mcmachado/count_based_exploration_sr
OfficialIn papernone★ 31
github.com/bonniesjli/DQN_SR
pytorch★ 0

Abstract

In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states. Here we show that the norm of the SR, while it is being learned, can be used as a reward bonus to incentivize exploration. In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each state (or feature) has been observed. We use this result to introduce an algorithm that performs as well as some theoretically sample-efficient approaches. Finally, we extend these ideas to a deep RL algorithm and show that it achieves state-of-the-art performance in Atari 2600 games when in a low sample-complexity regime.

Tasks

Atari Games Efficient Exploration Reinforcement Learning Reinforcement Learning (RL)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Atari 2600 Freeway	DQNMMCe	Score	29.5	—	Unverified
Atari 2600 Gravitar	DQNMMCe	Score	1,078.3	—	Unverified
Atari 2600 Montezuma's Revenge	DQNMMCe+SR	Score	1,778.6	—	Unverified
Atari 2600 Montezuma's Revenge	DQN+SR	Score	1,778.8	—	Unverified
Atari 2600 Private Eye	DQNMMCe+SR	Score	99.1	—	Unverified
Atari 2600 Solaris	DQNMMCe	Score	2,244.6	—	Unverified
Atari 2600 Venture	DQNMMCe+SR	Score	1,241.8	—	Unverified

Count-Based Exploration with the Successor Representation

Code

Abstract

Tasks

Benchmark Results

Reproductions