SOTAVerified

Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling

2025-02-20Unverified0· sign in to hype

Hao Qin, Kwang-Sung Jun, Chicheng Zhang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We study the problem of K-armed bandits with reward distributions belonging to a one-parameter exponential distribution family. In the literature, several criteria have been proposed to evaluate the performance of such algorithms, including Asymptotic Optimality, Minimax Optimality, Sub-UCB, and variance-adaptive worst-case regret bound. Thompson Sampling-based and Upper Confidence Bound-based algorithms have been employed to achieve some of these criteria. However, none of these algorithms simultaneously satisfy all the aforementioned criteria. In this paper, we design an algorithm, Exponential Kullback-Leibler Maillard Sampling (abbrev. Exp-KL-MS), that can achieve multiple optimality criteria simultaneously, including Asymptotic Optimality, Minimax Optimality with a (K) factor, Sub-UCB, and variance-adaptive worst-case regret bound.

Tasks

Reproductions