Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
Yuling Yan, Gen Li, Yuxin Chen, Jianqing Fan
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a -discounted infinite-horizon Markov game with S states, where the max-player has A actions and the min-player has B actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an -approximate Nash equilibrium with a sample complexity no larger than C_clipped^S(A+B)(1-)^3^2 (up to some log factor). Here, C_clipped^ is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy can be any value within (0,11-]. Our sample complexity bound strengthens prior art by a factor of ,B\, achieving minimax optimality for the entire -range. An appealing feature of our result lies in algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.