Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms
2017-11-05Unverified0· sign in to hype
Markus Dumke
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q() algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q() algorithm to an online multi-step algorithm Q(, ) using eligibility traces and introduces Double Q() as the extension of Q() to double learning. Experiments suggest that the new Q(, ) algorithm can outperform the classical TD control methods Sarsa(), Q() and Q().