SOTAVerified

Bandit Learning with Implicit Feedback

2018-12-01NeurIPS 2018Code Available0· sign in to hype

Yi Qi, Qingyun Wu, Hongning Wang, Jie Tang, Maosong Sun

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.

Tasks

Reproductions