Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs
Alexander Ryabchenko, Idan Attias, Daniel M. Roy
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study online learning with oblivious losses and delays under a novel ``capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Under ``clairvoyance'' (i.e., delay durations are revealed upfront each round) and/or ``preemptibility'' (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the ``optimal capacity'' needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For K actions and total delay D over T rounds, under clairvoyance and assuming capacity C = ((T)), we achieve regret (TK + DK/C + D(K)) for bandits and ((D+T)(K)) for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound d_, adding O(d_) to the regret. For fixed delays d (i.e., D=Td), the minimax regret is (TK(1+d/C)+Td(K)) and the optimal capacity is ( /(K),d\) in the bandit setting, while in the full-information setting, the minimax regret is (T(d+1)(K)) and the optimal capacity is (1). For round-dependent and fixed delays, our upper bounds are achieved using novel scheduling policies, based on Pareto-distributed proxy delays and batching techniques. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.