Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization
Lesi Chen, Junru Li, El Mahdi Chayti, Jingzhao Zhang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper studies the complexity of finding an ε-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F^2SA, achieving the O(ε^-6) upper complexity bound for first-order smooth problems. This is slower than the optimal Ω(ε^-4) complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F^2SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F^2SA-p that uses pth-order finite difference for hyper-gradient approximation and improves the upper bound to O(p ε^-4-p/2) for pth-order smooth problems. Finally, we demonstrate that the Ω(ε^-4) lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F^2SA-p is nearly optimal in the highly smooth region p = Ω( ε^-1 / ε^-1).