Double Q-learning: New Analysis and Sharper Finite-time Bound
Lin Zhao, Huaqing Xiong, Yingbin Liang, Wei zhang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Double Q-learning (Hasselt 2010) has gained significant success in practice due to its effectiveness in overcoming the overestimation issue of Q-learning. However, theoretical understanding of double Q-learning is rather limited and the only existing finite-time analysis was recently established in (Xiong et al. 2020) under a polynomial learning rate. This paper analyzes the more challenging case with a rescaled linear learning rate for which the previous method does not appear to be applicable. We develop new analytical tools that achieve an order-level better finite-time convergence rate than the previously established result. Specifically, we show that synchronous double Q-learning attains an -accurate global optimum with a time complexity of ( D(1-)^6^2 + D(1-)^7^2 ), and the asynchronous algorithm attains a time complexity of (L^6 D(1-)^7^2 ), where D is the cardinality of the state-action space, is the discount factor, and L is a parameter related to the sampling strategy for asynchronous double Q-learning. These results improve the order-level dependence of the convergence rate on all major parameters (,1-, D, L) provided in (Xiong et al. 2020). The new analysis provided in this paper presents a more direct and succinct approach for characterizing the finite-time convergence rate of Double Q-learning.