A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for m-Set Semi-Bandit Problem
Botao Chen, Jongyeong Lee, Chansoo Kim, Junya Honda
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in m-set semi-bandit problems. FTPL has been studied extensively as a promising candidate of an efficient algorithm with favorable regret for adversarial combinatorial semi-bandits. Nevertheless, the optimality of FTPL has still been unknown unlike Follow-the-Regularized-Leader (FTRL) whose optimality has been proved for various tasks of online learning. In this paper, we extend the analysis of FTPL with geometric resampling (GR) to m-set semi-bandits, which is a special case of combinatorial semi-bandits, showing that FTPL with Fréchet and Pareto distributions with certain parameters achieves the best possible regret of O(mdT) in adversarial setting. We also show that FTPL with Fréchet and Pareto distributions with a certain parameter achieves a logarithmic regret for stochastic setting, meaning the Best-of-Both-Worlds optimality of FTPL for m-set semi-bandit problems. Furthermore, we extend the conditional geometric resampling to m-set semi-bandits for efficient loss estimation in FTPL, reducing the computational complexity from O(d^2) of the original geometric resampling to O(md((d/m)+1)) without sacrificing the regret performance.