SOTAVerified

Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay

2021-09-22NeurIPS Workshop ICBINB 2021Unverified0· sign in to hype

Iryna Korshunova, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel, Edward Grefenstette

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels. PLR selectively samples training levels weighed by a function of recent temporal-difference errors experienced on each level. We explore the dispersion of returns as an alternative prioritization criterion to address certain issues with TD error scores.

Tasks

Reproductions