Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture
Lijie Chen, Jian Li
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
The best arm identification problem (BEST-1-ARM) is the most basic pure exploration problem in stochastic multi-armed bandits. The problem has a long history and attracted significant attention for the last decade. However, we do not yet have a complete understanding of the optimal sample complexity of the problem: The state-of-the-art algorithms achieve a sample complexity of O(_i=2^n _i^-2(^-1 + _i^-1)) (_i is the difference between the largest mean and the i^th mean), while the best known lower bound is (_i=2^n _i^-2^-1) for general instances and (^-2 ^-1) for the two-arm instances. We propose to study the instance-wise optimality for the BEST-1-ARM problem. Previous work has proved that it is impossible to have an instance optimal algorithm for the 2-arm problem. However, we conjecture that modulo the additive term (_2^-2 _2^-1) (which is an upper bound and worst case lower bound for the 2-arm problem), there is an instance optimal algorithm for BEST-1-ARM. Moreover, we introduce a new quantity, called the gap entropy for a best-arm problem instance, and conjecture that it is the instance-wise lower bound. Hence, resolving this conjecture would provide a final answer to the old and basic problem.