Leveraging Offline Data in Linear Latent Bandits

2024-05-27Unverified0· sign in to hype

Chinmaya Kausik, Kevin Tan, Ambuj Tewari

Unverified — Be the first to reproduce this paper.

Abstract

Sequential decision-making domains such as recommender systems, healthcare and education often have unobserved heterogeneity in the population that can be modeled using latent bandits - a framework where an unobserved latent state determines the model for a trajectory. While the latent bandit framework is compelling, the extent of its generality is unclear. We first address this by establishing a de Finetti theorem for decision processes, and show that every exchangeable and coherent stateless decision process is a latent bandit. The latent bandit framework lends itself particularly well to online learning with offline datasets, a problem of growing interest in sequential decision-making. One can leverage offline latent bandit data to learn a complex model for each latent state, so that an agent can simply learn the latent state online to act optimally. We focus on a linear model for a latent bandit with d_A-dimensional actions, where the latent states lie in an unknown d_K-dimensional subspace for d_K d_A. We present SOLD, a novel principled method to learn this subspace from short offline trajectories with guarantees. We then provide two methods to leverage this subspace online: LOCAL-UCB and ProBALL-UCB. We demonstrate that LOCAL-UCB enjoys O((d_AT, d_KT(1+d_AT/d_KN))) regret guarantees, where the effective dimension is lower when the size N of the offline dataset is larger. ProBALL-UCB enjoys a slightly weaker guarantee, but is more practical and computationally efficient. Finally, we establish the efficacy of our methods using experiments on both synthetic data and real-life movie recommendation data from MovieLens.

Tasks

Decision Making Movie Recommendation Recommendation Systems Sequential Decision Making

Leveraging Offline Data in Linear Latent Bandits

Abstract

Tasks

Reproductions