Pessimism for Offline Linear Contextual Bandits using _p Confidence Sets

2022-05-21Unverified0· sign in to hype

Gene Li, Cong Ma, Nathan Srebro

Unverified — Be the first to reproduce this paper.

Abstract

We present a family \\_p 1 of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different _p norms, where _2 corresponds to Bellman-consistent pessimism (BCP), while _ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel _ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all _q-constrained problems, and as such it strictly dominates all other predictors in the family, including _2.

Tasks

Multi-Armed Bandits

Pessimism for Offline Linear Contextual Bandits using _p Confidence Sets

Abstract

Tasks

Reproductions