SOTAVerified

Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

2021-12-01NeurIPS 2021Unverified0· sign in to hype

Arnab Maiti, Vishakha Patil, Arindam Khan

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show (T^2/3) cumulative regret in expectation for single-pass algorithms for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we provide an (, )-PAC algorithm with arm memory size of O(^*n) and O(n^2 (1)) optimal sample complexity.

Tasks

Reproductions