Minimum Empirical Divergence for Sub-Gaussian Linear Bandits
Kapilan Balagopalan, Kwang-Sung Jun
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/Kapilan-Balagopalan/Linear-Bandit-AlgorithmsOfficialnone★ 1
Abstract
We propose a novel linear bandit algorithm called LinMED (Linear Minimum Empirical Divergence), which is a linear extension of the MED algorithm that was originally designed for multi-armed bandits. LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities, unlike the popular randomized algorithm called linear Thompson sampling. Such a feature proves useful for off-policy evaluation where the unbiased evaluation requires accurately computing the sampling probability. We prove that LinMED enjoys a near-optimal regret bound of dn up to logarithmic factors where d is the dimension and n is the time horizon. We further show that LinMED enjoys a d^2(^2(n))((n)) problem-dependent regret where is the smallest sub-optimality gap, which is lower than d^2^3(n) of the standard algorithm OFUL (Abbasi-yadkori et al., 2011). Our empirical study shows that LinMED has a competitive performance with the state-of-the-art algorithms.