Stochastic Multi-armed Bandits in Constant Space

2017-12-25Unverified0· sign in to hype

David Liau, Eric Price, Zhao Song, Ger Yang

Unverified — Be the first to reproduce this paper.

Abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all K arms. We give an algorithm using O(1) words of space with regret \[ _i=1^K1 _i _i T \] where _i is the gap between the best arm and arm i and is the gap between the best and the second-best arms. If the rewards are bounded away from 0 and 1, this is within an O( 1/) factor of the optimum regret possible without space constraints.

Tasks

Multi-Armed Bandits

Stochastic Multi-armed Bandits in Constant Space

Abstract

Tasks

Reproductions