BooVI: Provably Efficient Bootstrapped Value Iteration

2021-12-01NeurIPS 2021Unverified0· sign in to hype

Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Unverified — Be the first to reproduce this paper.

Abstract

Despite the tremendous success of reinforcement learning (RL) with function approximation, efficient exploration remains a significant challenge, both practically and theoretically. In particular, existing theoretically grounded RL algorithms based on upper confidence bounds (UCBs), such as optimistic least-squares value iteration (LSVI), are often incompatible with practically powerful function approximators, such as neural networks. In this paper, we develop a variant of bootstrapped LSVI, namely BooVI, which bridges such a gap between practice and theory. Practically, BooVI drives exploration through (re)sampling, making it compatible with general function approximators. Theoretically, BooVI inherits the worst-case O(d^3 H^3 T)-regret of optimistic LSVI in the episodic linear setting. Here d is the feature dimension, H is the episode horizon, and T is the total number of steps.

Tasks

Efficient Exploration Reinforcement Learning (RL)

BooVI: Provably Efficient Bootstrapped Value Iteration

Abstract

Tasks

Reproductions