Gradient Q(σ, λ): A Unified Algorithm with Function Approximation for Reinforcement Learning
Long Yang, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q(,) is the first approach unifies them with eligibility trace through the sampling degree . However, it is limited to the tabular case, for large-scale learning, the Q(,) is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ(,) that extends tabular Q(,) with linear function approximation. We prove the convergence of GQ(,). Empirical results on some standard domains show that GQ(,) with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.