The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation

2019-06-04ICML 2020Unverified0· sign in to hype

Zhe Feng, David C. Parkes, Haifeng Xu

Unverified — Be the first to reproduce this paper.

Abstract

Motivated by economic applications such as recommender systems, we study the behavior of stochastic bandits algorithms under strategic behavior conducted by rational actors, i.e., the arms. Each arm is a self-interested strategic player who can modify its own reward whenever pulled, subject to a cross-period budget constraint, in order to maximize its own expected number of times of being pulled. We analyze the robustness of three popular bandit algorithms: UCB, -Greedy, and Thompson Sampling. We prove that all three algorithms achieve a regret upper bound O( \ B, K T\) where B is the total budget across arms, K is the total number of arms and T is length of the time horizon. This regret guarantee holds under arbitrary adaptive manipulation strategy of arms. Our second set of main results shows that this regret bound is tight -- in fact for UCB it is tight even when we restrict the arms' manipulation strategies to form a Nash equilibrium. The lower bound makes use of a simple manipulation strategy, the same for all three algorithms, yielding a bound of ( \ B, K T\). Our results illustrate the robustness of classic bandits algorithms against strategic manipulations as long as B=o(T).

Tasks

Recommendation Systems Thompson Sampling

The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation

Abstract

Tasks

Reproductions