Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

2024-02-03Unverified0· sign in to hype

Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung

Unverified — Be the first to reproduce this paper.

Abstract

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce Q-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of Q-functions. By analyzing Q-function over-generalization, which impairs stable stitching, QCS adaptively integrates Q-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.

Tasks

Offline RL Reinforcement Learning (RL)

Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Abstract

Tasks

Reproductions