OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality
Ganzhao Yuan
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
The Exponential Moving Average (EMA) is a cornerstone of widely used optimizers such as Adam. However, existing theoretical analyses of Adam-style methods have notable limitations: their guarantees can remain suboptimal in the zero-noise regime, rely on restrictive boundedness conditions (e.g., bounded gradients or objective gaps), use constant or open-loop stepsizes, or require prior knowledge of Lipschitz constants. To overcome these bottlenecks, we introduce OptEMA and analyze two novel variants: OptEMA-M, which applies an adaptive, decreasing EMA coefficient to the first-order moment with a fixed second-order decay, and OptEMA-V, which swaps these roles. Crucially, OptEMA is closed-loop and Lipschitz-free in the sense that its effective stepsizes are trajectory-dependent and do not require the Lipschitz constant for parameterization. Under standard stochastic gradient descent (SGD) assumptions, namely smoothness, a lower-bounded objective, and unbiased gradients with bounded variance, we establish rigorous convergence guarantees. Both variants achieve a noise-adaptive convergence rate of O(T^-1/2+σ^1/2 T^-1/4) for the average gradient norm, where σ is the noise level. In particular, in the zero-noise regime where σ=0, our bounds reduce to the nearly optimal deterministic rate O(T^-1/2) without manual hyperparameter retuning.