Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms

2017-11-05Unverified0· sign in to hype

Markus Dumke

Unverified — Be the first to reproduce this paper.

Abstract

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q() algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q() algorithm to an online multi-step algorithm Q(, ) using eligibility traces and introduces Double Q() as the extension of Q() to double learning. Experiments suggest that the new Q(, ) algorithm can outperform the classical TD control methods Sarsa(), Q() and Q().

Tasks

Q-Learning reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms

Abstract

Tasks

Reproductions