SOTAVerified

Learning in Generalized Linear Contextual Bandits with Stochastic Delays

2019-12-01NeurIPS 2019Unverified0· sign in to hype

Zhengyuan Zhou, Renyuan Xu, Jose Blanchet

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision maker only after some delay, which is unknown and stochastic, even though a decision must be made at each time step for an incoming set of contexts. We study the performance of upper confidence bound (UCB) based algorithms adapted to this delayed setting. In particular, we design a delay-adaptive algorithm, which we call Delayed UCB, for generalized linear contextual bandits using UCB-style exploration and establish regret bounds under various delay assumptions. In the important special case of linear contextual bandits, we further modify this algorithm and establish a tighter regret bound under the same delay assumptions. Our results contribute to the broad landscape of contextual bandits literature by establishing that UCB algorithms, which are widely deployed in modern recommendation engines, can be made robust to delays.

Tasks

Reproductions