Exploration by Random Network Distillation

2018-10-30ICLRCode Available1· sign in to hype

Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov

Code Available — Be the first to reproduce this paper.

Code

github.com/openai/random-network-distillation
OfficialIn papertf★ 0
github.com/alirezakazemipour/ppo-rnd
pytorch★ 55
github.com/microsoft/strategically_efficient_rl
tf★ 21
github.com/riveSunder/carle
pytorch★ 9
github.com/kngwyu/intrinsic-rewards
pytorch★ 8
github.com/balloch/rl-exploration-transfer
pytorch★ 4
github.com/LeejwUniverse/RL_Exploration_Pytorch
pytorch★ 3
github.com/michalnand/reinforcement_learning
pytorch★ 2
github.com/Justkim/random-network-distillation-pytorch
pytorch★ 0
github.com/jakegrigsby/supersonic
tf★ 0

Abstract

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.

Tasks

Atari Games Deep Reinforcement Learning Montezuma's Revenge reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)Unsupervised Reinforcement Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Atari 2600 Gravitar	RND	Score	3,906	—	Unverified
Atari 2600 Montezuma's Revenge	RND	Score	8,152	—	Unverified
Atari 2600 Pitfall!	RND	Score	-3	—	Unverified
Atari 2600 Private Eye	RND	Score	8,666	—	Unverified
Atari 2600 Solaris	RND	Score	3,282	—	Unverified
Atari 2600 Venture	RND	Score	1,859	—	Unverified

Exploration by Random Network Distillation

Code

Abstract

Tasks

Benchmark Results

Reproductions