SOTAVerified

Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks

2024-02-14Unverified0· sign in to hype

Niccolò Turcato, Alberto Sinigaglia, Alberto Dalla Libera, Ruggero Carli, Gian Antonio Susto

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Continuous control Deep Reinforcement Learning (RL) approaches are known to suffer from estimation biases, leading to suboptimal policies. This paper introduces innovative methods in RL, focusing on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks, using Deep Double Q-Learning. We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent. Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity. Our extensive experiments across various continuous control tasks demonstrate the effectiveness of our approaches. We show that RL algorithms equipped with this method can match or surpass their counterparts, particularly in environments where estimation biases significantly impact learning. The results underline the importance of bias exploitation in improving policy learning in RL.

Tasks

Reproductions