Linear Q-Learning Does Not Diverge: Convergence Rates to a Bounded Set
Xinyu Liu, Zixuan Xie, Shangtong Zhang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Q-learning is one of the most fundamental reinforcement learning algorithms. Previously, it is widely believed that Q-learning with linear function approximation (i.e., linear Q-learning) suffers from possible divergence. This paper instead establishes the first L^2 convergence rate of linear Q-learning to a bounded set. Notably, we do not make any modification to the original linear Q-learning algorithm, do not make any Bellman completeness assumption, and do not make any near-optimality assumption on the behavior policy. All we need is an -softmax behavior policy with an adaptive temperature. The key to our analysis is the general result of stochastic approximations under Markovian noise with fast-changing transition functions. As a side product, we also use this general result to establish the L^2 convergence rate of tabular Q-learning with an -softmax behavior policy, for which we rely on a novel pseudo-contraction property of the weighted Bellman optimality operator.