SOTAVerified

Constrained Upper Confidence Reinforcement Learning

2020-01-26Unverified0· sign in to hype

Liyuan Zheng, Lillian J. Ratliff

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret ( O(T^34(T/))) with respect to the reward while satisfying the constraints even while learning with probability 1-. Illustrative examples are provided.

Tasks

Reproductions