Nonstochastic Bandits with Infinitely Many Experts
X. Flora Meng, Tuhin Sarkar, Munther A. Dahleh
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study the problem of nonstochastic bandits with expert advice, extending the setting from finitely many experts to any countably infinite set: A learner aims to maximize the total reward by taking actions sequentially based on bandit feedback while benchmarking against a set of experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings while preserving the order of the regret upper bound. We then incorporate the variant into a meta-algorithm that works on infinitely many experts. We prove a high-probability upper bound of O ( i^*K + KT ) on the regret, up to polylog factors, where i^* is the unknown position of the best expert, K is the number of actions, and T is the time horizon. We also provide an example of structured experts and discuss how to expedite learning in such case. Our meta-learning algorithm achieves optimal regret up to polylog factors when i^* = O ( T/K ). If a prior distribution is assumed to exist for i^*, the probability of optimality increases with T, the rate of which can be fast.