Skyline Identification in Multi-Armed Bandits
Albert Cheu, Ravi Sundaram, Jonathan Ullman
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of n arms A[1],,A[n], each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the skyline of the set A, consisting of all arms A[i] such that A[i] has larger expected reward than all lower-numbered arms A[1],,A[i-1]. We define a natural notion of an -approximate skyline and prove matching upper and lower bounds for identifying an -skyline. Specifically, we show that in order to identify an -skyline from among n arms with probability 1-, samples are necessary and sufficient. When 1/n, our results improve over the naive algorithm, which draws enough samples to approximate the expected reward of every arm; the algorithm of (Auer et al., AISTATS'16) for Pareto-optimal arm identification is likewise superseded. Our results show that the sample complexity of the skyline problem lies strictly in between that of best arm identification (Even-Dar et al., COLT'02) and that of approximating the expected reward of every arm.