ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control

2024-10-07Unverified0· sign in to hype

Ehsan Futuhi, Shayan Karimi, Chao GAO, Martin Müller

Unverified — Be the first to reproduce this paper.

Abstract

We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, t-greedy, which generates exploratory options for exploring less-visited states. We prove that search using t-greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, GDRB, and implement longest n-step returns. The resulting algorithm, ETGL-DDPG, integrates all three techniques: t-greedy, GDRB, and Longest n-step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the performance of DDPG in this setting.

Tasks

continuous-control Continuous Control

ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control

Abstract

Tasks

Reproductions