A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning

2025-05-18Unverified0· sign in to hype

Zachary Roch, Chi Zhang, George Atia, Yue Wang

Unverified — Be the first to reproduce this paper.

Abstract

Robust reinforcement learning (RL) under the average-reward criterion is crucial for long-term decision making under potential environment mismatches, yet its finite-sample complexity study remains largely unexplored. Existing works offer algorithms with asymptotic guarantees, but the absence of finite-sample analysis hinders its principled understanding and practical deployment, especially in data-limited settings. We close this gap by proposing Robust Halpern Iteration (RHI), the first algorithm with provable finite-sample complexity guarantee. Under standard uncertainty sets -- including contamination sets and _p-norm balls -- RHI attains an -optimal policy with near-optimal sample complexity of O(SA H^2^2), where S and A denote the numbers of states and actions, and H is the robust optimal bias span. This result gives the first polynomial sample complexity guarantee for robust average-reward RL. Moreover, our RHI's independence from prior knowledge distinguishes it from many previous average-reward RL studies. Our work thus constitutes a significant advancement in enhancing the practical applicability of robust average-reward methods to complex, real-world problems.

Tasks

Reinforcement Learning (RL)

A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning

Abstract

Tasks

Reproductions