Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning

2020-06-22Unverified0· sign in to hype

Chara Podimata, Aleksandrs Slivkins

Unverified — Be the first to reproduce this paper.

Abstract

Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the [0,1] interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually ``zooms in'' on the more promising regions thereof. The goal is to take advantage of ``nicer'' problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarial rewards is not. We provide the first algorithm (Adversarial Zooming) for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. We apply our algorithm to several fundamental applications -- including dynamic pricing and auction reserve tuning -- all under adversarial reward models. While these domains often violate Lipschitzness, our analysis only requires a weaker version thereof, allowing for meaningful regret bounds without additional smoothness assumptions. Notably, we extend our results to multi-product dynamic pricing with non-smooth reward structures, a setting which does not even satisfy one-sided Lipschitzness.

Tasks

Multi-Armed Bandits

Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning

Abstract

Tasks

Reproductions