Optimal δ-Correct Best-Arm Selection for Heavy-Tailed Distributions
Shubhada Agrawal, Sandeep Juneja, Peter Glynn
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Given a finite set of unknown distributions or arms that can be sampled, we consider the problem of identifying the one with the maximum mean using a -correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified ) that has minimum sample complexity. Lower bounds for -correct algorithms are well known. -correct algorithms that match the lower bound asymptotically as reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise, under a -correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a -correct algorithm that matches the lower bound as reduces to zero under the mild restriction that a known bound on the expectation of (1+)^th moment of the underlying random variables exists, for > 0. We also propose batch processing and identify near-optimal batch sizes to speed up the proposed algorithm substantially. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well-studied classic problem in the simulation community.