Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
Ruiquan Huang, Donghao Li, Chengshuai Shi, Cong Shen, Jing Yang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap O(1/(N_0/C(^*|)+N_1) ), where C(^*|) is a new concentrability coefficient, N_0 and N_1 are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant O( N_1/(N_0/C(^-|)+N_1) ) speed-up compared to pure online learning, where C(^-|) is the concentrability coefficient over all sub-optimal policies. Our results also reveal an interesting separation on the desired coverage properties of the offline dataset for sub-optimality gap minimization and regret minimization. We further validate our theoretical findings in several experiments in special RL models such as linear contextual bandits and Markov decision processes (MDPs).