SOTAVerified

Model Selection for Generic Reinforcement Learning

2021-07-13Unverified0· sign in to hype

Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel P^* belongs to a family of models P^* with finite metric entropy. In the model selection framework, instead of P^*, we are given M nested families of transition kernels _1 _2 _M. We propose and analyze a novel algorithm, namely Adaptive Reinforcement Learning (General) (ARL-GEN) that adapts to the smallest such family where the true transition kernel P^* lies. ARL-GEN uses the Upper Confidence Reinforcement Learning (UCRL) algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that ARL-GEN obtains a regret of O(d_E^*H^2+d_E^* M^* H^2 T), with high probability, where H is the horizon length, T is the total number of steps, d_E^* is the Eluder dimension and M^* is the metric entropy corresponding to P^*. Note that this regret scaling matches that of an oracle that knows P^* in advance. We show that the cost of model selection for ARL-GEN is an additive term in the regret having a weak dependence on T. Subsequently, we remove the separability assumption and consider the setup of linear mixture MDPs, where the transition kernel P^* has a linear function approximation. With this low rank structure, we propose novel adaptive algorithms for model selection, and obtain (order-wise) regret identical to that of an oracle with knowledge of the true model class.

Tasks

Reproductions