Self-supervised network distillation: an effective approach to exploration in sparse reward environments

2023-02-22Code Available0· sign in to hype

Matej Pecháč, Michal Chovanec, Igor Farkaš

Code Available — Be the first to reproduce this paper.

Code

github.com/iskandor/snd
OfficialIn paperpytorch★ 5
github.com/michalnand/reinforcement_learning
In paperpytorch★ 2

Abstract

Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally tested them on a set of ten environments that are considered difficult to explore. The results show that our approach achieves faster growth and higher external reward for the same training time compared to the baseline models, which implies improved exploration in a very sparse reward environment. In addition, the analytical methods we applied provide valuable explanatory insights into our proposed models.

Tasks

Atari Games Decision Making Novelty Detection reinforcement-learning Reinforcement Learning (RL)Self-Supervised Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Atari 2600 Gravitar	SND-VIC	Score	6,712	—	Unverified
Atari 2600 Gravitar	SND-STD	Score	4,643	—	Unverified
Atari 2600 Gravitar	SND-V	Score	2,741	—	Unverified
Atari 2600 Montezuma's Revenge	SND-V	Score	21,565	—	Unverified
Atari 2600 Montezuma's Revenge	SND-VIC	Score	7,838	—	Unverified
Atari 2600 Montezuma's Revenge	SND-STD	Score	7,212	—	Unverified
Atari 2600 Pitfall!	SND-V	Score	0	—	Unverified
Atari 2600 Pitfall!	SND-VIC	Score	0	—	Unverified
Atari 2600 Private Eye	SND-V	Score	4,213	—	Unverified
Atari 2600 Private Eye	SND-VIC	Score	17,313	—	Unverified
Atari 2600 Private Eye	SND-STD	Score	15,089	—	Unverified
Atari 2600 Solaris	SND-STD	Score	12,460	—	Unverified
Atari 2600 Solaris	SND-VIC	Score	11,865	—	Unverified
Atari 2600 Solaris	SND-V	Score	11,582	—	Unverified
Atari 2600 Venture	SND-VIC	Score	2,188	—	Unverified
Atari 2600 Venture	SND-STD	Score	2,138	—	Unverified
Atari 2600 Venture	SND-V	Score	1,787	—	Unverified

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

Code

Abstract

Tasks

Benchmark Results

Reproductions