Online Learning with Vector Costs and Bandits with Knapsacks
Thomas Kesselheim, Sahil Singla
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We introduce online learning with vector costs ( ) where in each time step t \1,, T\, we need to play an action i \1,,n\ that incurs an unknown vector cost in [0,1]^d. The goal of the online algorithm is to minimize the _p norm of the sum of its cost vectors. This captures the classical online learning setting for d=1, and is interesting for general d because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study in both stochastic and adversarial arrival settings, and give a general procedure to reduce the problem from d dimensions to a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight O( , d\) competitive ratio for adversarial arrivals. The problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks ( ) problem. This connection allows us to use our techniques to obtain (near) optimal results for in both stochastic and adversarial settings. In particular, we obtain a tight O( d T) competitive ratio algorithm for adversarial , which improves over the O(d T) competitive ratio algorithm of Immorlica et al. [FOCS'19].