An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces
Alexander Terenin, Jeffrey Negrea
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling over the d-dimensional unit cube, using a certain Gaussian process prior widely-used in the Bayesian optimization literature, has a O(Td(1+d)) rate against a -bounded -Lipschitz adversary.