Restless Linear Bandits
Azadeh Khaleghi
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown R^d-valued stationary -mixing sequence of parameters (_t,~t N) which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the -dependence between consecutive _t. An optimistic algorithm, called LinMix-UCB, is proposed for the case where _t has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of O(d npolylog(n) ) with respect to an oracle that always plays a multiple of E_t. The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee's coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of E_t.