Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits

2024-05-30Unverified0· sign in to hype

Yuchen He, Zichun Ye, Chihao Zhang

Unverified — Be the first to reproduce this paper.

Abstract

We study the stochastic multi-armed bandit problem in the P-pass streaming model. In this problem, the n arms are present in a stream and at most m<n arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of m, n and P. Specifically, we design an algorithm with O((n-m)^1+2^P-22^P+1-1 n^2-2^P+12^P+1-1 T^2^P2^P+1-1) regret and complement it with an ((n-m)^1+2^P-22^P+1-1 n^2-2^P+12^P+1-1 T^2^P2^P+1-1) lower bound when the number of rounds T is sufficiently large. Our results are tight up to a logarithmic factor in n and P.

Tasks

Multi-Armed Bandits

Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits

Abstract

Tasks

Reproductions