Generalized Linear Bandits with Limited Adaptivity
Ayush Sawarni, Nirjhar Das, Siddharth Barman, Gaurav Sinha
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/nirjhar-das/glbandit_limited_adaptivityOfficialIn papernone★ 0
- github.com/nick-jhlee/logistic_banditnone★ 6
Abstract
We study the generalized linear contextual bandit problem within the constraints of limited adaptivity. In this paper, we present two algorithms, B-GLinCB and RS-GLinCB, that address, respectively, two prevalent limited adaptivity settings. Given a budget M on the number of policy updates, in the first setting, the algorithm needs to decide upfront M rounds at which it will update its policy, while in the second setting it can adaptively perform M policy updates during its course. For the first setting, we design an algorithm B-GLinCB, that incurs O(T) regret when M = ( T ) and the arm feature vectors are generated stochastically. For the second setting, we design an algorithm RS-GLinCB that updates its policy O(^2 T) times and achieves a regret of O(T) even when the arm feature vectors are adversarially generated. Notably, in these bounds, we manage to eliminate the dependence on a key instance dependent parameter , that captures non-linearity of the underlying reward model. Our novel approach for removing this dependence for generalized linear contextual bandits might be of independent interest.