SOTAVerified

Variance-reduced Q-learning is minimax optimal

2019-06-11Unverified0· sign in to hype

Martin J. Wainwright

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We introduce and analyze a form of variance-reduced Q-learning. For -discounted MDPs with finite state space X and action space U, we prove that it yields an -accurate estimate of the optimal Q-function in the _-norm using O ((D ^2 (1-)^3 ) \; ( D(1-) ) ) samples, where D = |X| |U|. This guarantee matches known minimax lower bounds up to a logarithmic factor in the discount complexity. In contrast, our past work shows that ordinary Q-learning has worst-case quartic scaling in the discount complexity.

Tasks

Reproductions