Pessimism for Offline Linear Contextual Bandits using _p Confidence Sets
2022-05-21Unverified0· sign in to hype
Gene Li, Cong Ma, Nathan Srebro
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We present a family \\_p 1 of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different _p norms, where _2 corresponds to Bellman-consistent pessimism (BCP), while _ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel _ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all _q-constrained problems, and as such it strictly dominates all other predictors in the family, including _2.