SOTAVerified

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

2020-02-17Unverified0· sign in to hype

Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The current paper studies the problem of agnostic Q-learning with function approximation in deterministic systems where the optimal Q-function is approximable by a function in the class F with approximation error 0. We propose a novel recursion-based algorithm and show that if = O(/_E), then one can find the optimal policy using O(_E) trajectories, where is the gap between the optimal Q-value of the best actions and that of the second-best actions and _E is the Eluder dimension of F. Our result has two implications: 1) In conjunction with the lower bound in [Du et al., ICLR 2020], our upper bound suggests that the condition = (/dim_E) is necessary and sufficient for algorithms with polynomial sample complexity. 2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity (dim_E) is tight even in the agnostic setting. Therefore, we settle the open problem on agnostic Q-learning proposed in [Wen and Van Roy, NIPS 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.

Tasks

Reproductions