Randomized Exploration for Non-Stationary Stochastic Linear Bandits

2019-12-11Code Available0· sign in to hype

Baekjin Kim, Ambuj Tewari

Code Available — Be the first to reproduce this paper.

Code

github.com/baekjin-kim/NonstationaryLB
OfficialIn papernone★ 0
github.com/baekjin-kim/UAI2020_NonstationaryLinearBandit
none★ 0

Abstract

We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice. The first approach replaces optimism with a simple randomization when using confidence sets. The second one adds random perturbations to its current estimate before maximizing the expected reward. For non-stationary linear bandits, where each action is associated with a d-dimensional feature and the unknown parameter is time-varying with total variation B_T, we propose two randomized algorithms, Discounted Randomized LinUCB (D-RandLinUCB) and Discounted Linear Thompson Sampling (D-LinTS) via the two perturbation approaches. We highlight the statistical optimality versus computational efficiency trade-off between them in that the former asymptotically achieves the optimal dynamic regret O(d^7/8 B_T^1/4T^3/4), but the latter is oracle-efficient with an extra logarithmic factor in the number of arms compared to minimax-optimal dynamic regret. In a simulation study, both algorithms show outstanding performance in tackling conservatism issue that Discounted LinUCB struggles with.

Tasks

Computational Efficiency Thompson Sampling

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

Code

Abstract

Tasks

Reproductions