SOTAVerified

LOCO: Adaptive exploration in reinforcement learning via local estimation of contraction coefficients

2021-03-09ICLR Workshop SSL-RL 2021Unverified0· sign in to hype

Manfred Diaz, Liam Paull, Pablo Samuel Castro

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We offer a novel approach to balance exploration and exploitation in reinforcement learning (RL). To do so, we characterize an environment’s exploration difficulty via the Second Largest Eigenvalue Modulus (SLEM) of the Markov chain induced by uniform stochastic behaviour. Specifically, we investigate the connection of state-space coverage with the SLEM of this Markov chain and use the theory of contraction coefficients to derive estimates of this eigenvalue of interest. Furthermore, we introduce a method for estimating the contraction coefficients on a local level and leverage it to design a novel exploration algorithm. We evaluate our algorithm on a series of GridWorld tasks of varying sizes and complexity.

Tasks

Reproductions