SOTAVerified

From Theory to Practice with RAVEN-UCB: Addressing Non-Stationarity in Multi-Armed Bandits through Variance Adaptation

2025-06-03Code Available0· sign in to hype

Junyi Fang, Yuxun Chen, Yuxin Chen, Chen Zhang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

The Multi-Armed Bandit (MAB) problem is challenging in non-stationary environments where reward distributions evolve dynamically. We introduce RAVEN-UCB, a novel algorithm that combines theoretical rigor with practical efficiency via variance-aware adaptation. It achieves tighter regret bounds than UCB1 and UCB-V, with gap-dependent regret of order K _^2 T / and gap-independent regret of order K T T. RAVEN-UCB incorporates three innovations: (1) variance-driven exploration using _k^2 / (N_k + 1) in confidence bounds, (2) adaptive control via _t = _0 / (t + ), and (3) constant-time recursive updates for efficiency. Experiments across non-stationary patterns - distributional changes, periodic shifts, and temporary fluctuations - in synthetic and logistics scenarios demonstrate its superiority over state-of-the-art baselines, confirming theoretical and practical robustness.

Tasks

Reproductions