SOTAVerified

Speedy Q-Learning

2011-12-01NeurIPS 2011Unverified0· sign in to hype

Mohammad Ghavamzadeh, Hilbert J. Kappen, Mohammad G. Azar, Rémi Munos

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor only T=O ( (n)/( ^2(1- )^4) ) steps are required for the SQL algorithm to converge to an -optimal action-value function with high probability. This bound has a better dependency on 1/ and 1/(1- ), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.

Tasks

Reproductions