Fully Gap-Dependent Bounds for Multinomial Logit Bandit

2020-11-19Unverified0· sign in to hype

Jiaqi Yang

Unverified — Be the first to reproduce this paper.

Abstract

We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most K from a pool of N items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment S^* within O(_i = 1^N _i^-2) time steps with high probability, and (ii) an algorithm that incurs O(_i S^* K_i^-1 T) regret in T time steps. To our knowledge, our algorithms are the first to achieve gap-dependent bounds that fully depends on the suboptimality gaps of all items. Our technical contributions include an algorithmic framework that relates the MNL-bandit problem to a variant of the top-K arm identification problem in multi-armed bandits, a generalized epoch-based offering procedure, and a layer-based adaptive estimation procedure.

Tasks

Multi-Armed Bandits

Fully Gap-Dependent Bounds for Multinomial Logit Bandit

Abstract

Tasks

Reproductions