Contextual Bandits for Unbounded Context Distributions

2024-08-19Unverified0· sign in to hype

Puning Zhao, Rongfei Fan, Shaowei Wang, Li Shen, Qixin Zhang, Zong Ke, Tianhang Zheng

Unverified — Be the first to reproduce this paper.

Abstract

Nonparametric contextual bandit is an important model of sequential decision making problems. Under -Tsybakov margin condition, existing research has established a regret bound of O(T^1-+1d+2) for bounded supports. However, the optimal regret with unbounded contexts has not been analyzed. The challenge of solving contextual bandit problems with unbounded support is to achieve both exploration-exploitation tradeoff and bias-variance tradeoff simultaneously. In this paper, we solve the nonparametric contextual bandit problem with unbounded contexts. We propose two nearest neighbor methods combined with UCB exploration. The first method uses a fixed k. Our analysis shows that this method achieves minimax optimal regret under a weak margin condition and relatively light-tailed context distributions. The second method uses adaptive k. By a proper data-driven selection of k, this method achieves an expected regret of O(T^1-(+1)+(d+2)+T^1-), in which is a parameter describing the tail strength. This bound matches the minimax lower bound up to logarithm factors, indicating that the second method is approximately optimal.

Tasks

Decision Making Multi-Armed Bandits Sequential Decision Making

Contextual Bandits for Unbounded Context Distributions

Abstract

Tasks

Reproductions