Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries
Arnab Maiti, Zhiyuan Fan, Kevin Jamieson, Lillian J. Ratliff, Gabriele Farina
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper, we study the online shortest path problem in directed acyclic graphs (DAGs) under bandit feedback against an adaptive adversary. Given a DAG G = (V, E) with a source node v_s and a sink node v_t, let X \0,1\^|E| denote the set of all paths from v_s to v_t. At each round t, we select a path x_t X and receive bandit feedback on our loss x_t, y_t [-1,1], where y_t is an adversarially chosen loss vector. Our goal is to minimize regret with respect to the best path in hindsight over T rounds. We propose the first computationally efficient algorithm to achieve a near-minimax optimal regret bound of O(|E|T |X|) with high probability against any adaptive adversary, where O() hides logarithmic factors in the number of edges |E|. Our algorithm leverages a novel loss estimator and a centroid-based decomposition in a nontrivial manner to attain this regret bound. As an application, we show that our algorithm for DAGs provides state-of-the-art efficient algorithms for m-sets, extensive-form games, the Colonel Blotto game, shortest walks in directed graphs, hypercubes, and multi-task multi-armed bandits, achieving improved high-probability regret guarantees in all these settings.