From Theory to Practice with RAVEN-UCB: Addressing Non-Stationarity in Multi-Armed Bandits through Variance Adaptation
Junyi Fang, Yuxun Chen, Yuxin Chen, Chen Zhang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/66661654/Raven-UCBOfficialIn papernone★ 3
Abstract
The Multi-Armed Bandit (MAB) problem is challenging in non-stationary environments where reward distributions evolve dynamically. We introduce RAVEN-UCB, a novel algorithm that combines theoretical rigor with practical efficiency via variance-aware adaptation. It achieves tighter regret bounds than UCB1 and UCB-V, with gap-dependent regret of order K _^2 T / and gap-independent regret of order K T T. RAVEN-UCB incorporates three innovations: (1) variance-driven exploration using _k^2 / (N_k + 1) in confidence bounds, (2) adaptive control via _t = _0 / (t + ), and (3) constant-time recursive updates for efficiency. Experiments across non-stationary patterns - distributional changes, periodic shifts, and temporary fluctuations - in synthetic and logistics scenarios demonstrate its superiority over state-of-the-art baselines, confirming theoretical and practical robustness.