LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

2024-03-05Unverified0· sign in to hype

Masahiro Kato, Shinji Ito

Unverified — Be the first to reproduce this paper.

Abstract

This study considers the linear contextual bandit problem with independent and identically distributed (i.i.d.) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy O(^2(T)) for the number of rounds T in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying O(T) in an adversarial regime. However, the dependency on T has room for improvement, and the suboptimality-gap assumption can be relaxed. For this issue, this study proposes an algorithm whose regret satisfies O((T)) in the setting when the suboptimality gap is lower-bounded. Furthermore, we introduce a margin condition, a milder assumption on the suboptimality gap. That condition characterizes the problem difficulty linked to the suboptimality gap using a parameter (0, ]. We then show that the algorithm's regret satisfies O(\(T)\^1+2+T^12+). Here, = corresponds to the case in the existing studies where a lower bound exists in the suboptimality gap, and our regret satisfies O((T)) in that case. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the -Linear-Contextual (LC)-Tsallis-INF.

Tasks

Multi-Armed Bandits

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Abstract

Tasks

Reproductions