SOTAVerified

Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

2024-02-10Unverified0· sign in to hype

Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Andrey Pudovikov

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an O( TKT T) regret bound instead of O (T^11+ K^1+ ) for the case when the reward distribution satisfies E_X D[|X|^1+] ^1+ ( (0, 1]), i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. Moreover, the same bound holds even when the reward distribution does not have the expectation, that is, when <0.

Tasks

Reproductions