SOTAVerified

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

2026-03-14Unverified0· sign in to hype

Ruiyuan Huang, Zicheng Lyu, Xiaoyi Zhu, Zengfeng Huang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in B batches and has only W bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret O(KT) is achievable with O( T) bits of memory under fully adaptive interaction, and with a K-independent O( T)-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a W-bit memory constraint must use at least Ω(K/W) batches to achieve near-minimax regret O(KT) , even under adaptive grids. In particular, logarithmic memory rules out K-independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire Ω(K) bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with B batches and W bits of memory allows only O(BW) bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using O( T) bits of memory and O(K) batches that achieves regret O(KT), which nearly matches our lower bound.

Reproductions