SOTAVerified

Optimistic Policy Optimization with General Function Approximations

2021-01-01Unverified0· sign in to hype

Qi Cai, Zhuoran Yang, Csaba Szepesvari, Zhaoran Wang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Although policy optimization with neural networks has a track record of achieving state-of-the-art results in reinforcement learning on various domains, the theoretical understanding of the computational and sample efficiency of policy optimization remains restricted to linear function approximations with finite-dimensional feature representations, which hinders the design of principled, effective, and efficient algorithms. To this end, we propose an optimistic policy optimization algorithm, which allows general function approximations while incorporating~exploration. In the episodic setting, we establish a T-regret that scales polynomially in the eluder dimension of the general model class. Here T is the number of steps taken by the agent. In particular, we specialize such a regret to handle two nonparametric model classes; one based on reproducing kernel Hilbert spaces and another based on overparameterized neural networks.

Tasks

Reproductions