High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

2021-06-28NeurIPS 2021Unverified0· sign in to hype

Ashok Cutkosky, Harsh Mehta

Unverified — Be the first to reproduce this paper.

Abstract

We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clipping, momentum, and normalized gradient descent yields convergence to critical points in high-probability with best-known rates for smooth losses when the gradients only have bounded pth moments for some p(1,2]. We then consider the case of second-order smooth losses, which to our knowledge have not been studied in this setting, and again obtain high-probability bounds for any p. Moreover, our results hold for arbitrary smooth norms, in contrast to the typical SGD analysis which requires a Hilbert space norm. Further, we show that after a suitable "burn-in" period, the objective value will monotonically decrease for every iteration until a critical point is identified, which provides intuition behind the popular practice of learning rate "warm-up" and also yields a last-iterate guarantee.

Tasks

Stochastic Optimization Vocal Bursts Intensity Prediction

High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

Abstract

Tasks

Reproductions