SOTAVerified

Provably Efficient Reinforcement Learning with Aggregated States

2019-12-13Unverified0· sign in to hype

Shi Dong, Benjamin Van Roy, Zhengyuan Zhou

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We establish that an optimistic variant of Q-learning applied to a fixed-horizon episodic Markov decision process with an aggregated state representation incurs regret O(H^5 M K + HK), where H is the horizon, M is the number of aggregate states, K is the number of episodes, and is the largest difference between any pair of optimal state-action values associated with a common aggregate state. Notably, this regret bound does not depend on the number of states or actions and indicates that asymptotic per-period regret is no greater than , independent of horizon. To our knowledge, this is the first such result that applies to reinforcement learning with nontrivial value function approximation without any restrictions on transition probabilities.

Tasks

Reproductions