Multi-Armed Bandits with Metric Movement Costs

2017-10-24NeurIPS 2017Unverified0· sign in to hype

Tomer Koren, Roi Livni, Yishay Mansour

Unverified — Be the first to reproduce this paper.

Abstract

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure C of the underlying metric which depends on its covering numbers. In finite metric spaces with k actions, we give an efficient algorithm that achieves regret of the form O( ^1/3T^2/3,kT\), and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret ( ^1/3T^2/3,kT\) where C=(k), and (ii) the interval metric with regret ( ^2/3,kT\) where C=(1). For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of (T^d+1d+2) where d 1 is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.

Tasks

Multi-Armed Bandits

Multi-Armed Bandits with Metric Movement Costs

Abstract

Tasks

Reproductions