Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap
Nikolai Karpov, Chen Wang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap _[2]. Here, and throughout, the optimality gap _[i] is defined as the mean reward gap between the best and the i-th best arms. A recent line of results by Jin, Huang, Tang, and Xiao [ICML'21] and Assadi and Wang [COLT'24] have shown that if there is no known _[2], a pass complexity of ((1/_[2])) (up to (1/_[2]) terms) is necessary and sufficient to obtain the *worst-case optimal* sample complexity of O(n/^2_[2]) with a single-arm memory. However, our understanding of multi-pass algorithms with known _[2] is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., O( _i=2^n1/^2_[i]) arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is (n) passes (up to n terms). We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of o(n/polylog(n)) arms -- and O(_i=2^n1/^2_[i] (n)) arm pulls has to make (nn) passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of _[2], finds the best arm with O( _i=2^n1/^2_[i] n) arm pulls and a *single arm* memory.