SOTAVerified

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

2023-02-22Code Available0· sign in to hype

Matej Pecháč, Michal Chovanec, Igor Farkaš

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally tested them on a set of ten environments that are considered difficult to explore. The results show that our approach achieves faster growth and higher external reward for the same training time compared to the baseline models, which implies improved exploration in a very sparse reward environment. In addition, the analytical methods we applied provide valuable explanatory insights into our proposed models.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Atari 2600 GravitarSND-VICScore6,712Unverified
Atari 2600 GravitarSND-STDScore4,643Unverified
Atari 2600 GravitarSND-VScore2,741Unverified
Atari 2600 Montezuma's RevengeSND-VScore21,565Unverified
Atari 2600 Montezuma's RevengeSND-VICScore7,838Unverified
Atari 2600 Montezuma's RevengeSND-STDScore7,212Unverified
Atari 2600 Pitfall!SND-VScore0Unverified
Atari 2600 Pitfall!SND-VICScore0Unverified
Atari 2600 Private EyeSND-VScore4,213Unverified
Atari 2600 Private EyeSND-VICScore17,313Unverified
Atari 2600 Private EyeSND-STDScore15,089Unverified
Atari 2600 SolarisSND-STDScore12,460Unverified
Atari 2600 SolarisSND-VICScore11,865Unverified
Atari 2600 SolarisSND-VScore11,582Unverified
Atari 2600 VentureSND-VICScore2,188Unverified
Atari 2600 VentureSND-STDScore2,138Unverified
Atari 2600 VentureSND-VScore1,787Unverified

Reproductions