OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

2026-03-10Unverified0· sign in to hype

Ganzhao Yuan

Unverified — Be the first to reproduce this paper.

Abstract

The Exponential Moving Average (EMA) is a cornerstone of widely used optimizers such as Adam. However, existing theoretical analyses of Adam-style methods have notable limitations: their guarantees can remain suboptimal in the zero-noise regime, rely on restrictive boundedness conditions (e.g., bounded gradients or objective gaps), use constant or open-loop stepsizes, or require prior knowledge of Lipschitz constants. To overcome these bottlenecks, we introduce OptEMA and analyze two novel variants: OptEMA-M, which applies an adaptive, decreasing EMA coefficient to the first-order moment with a fixed second-order decay, and OptEMA-V, which swaps these roles. Crucially, OptEMA is closed-loop and Lipschitz-free in the sense that its effective stepsizes are trajectory-dependent and do not require the Lipschitz constant for parameterization. Under standard stochastic gradient descent (SGD) assumptions, namely smoothness, a lower-bounded objective, and unbiased gradients with bounded variance, we establish rigorous convergence guarantees. Both variants achieve a noise-adaptive convergence rate of O(T^-1/2+σ^1/2 T^-1/4) for the average gradient norm, where σ is the noise level. In particular, in the zero-noise regime where σ=0, our bounds reduce to the nearly optimal deterministic rate O(T^-1/2) without manual hyperparameter retuning.

OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

Abstract

Reproductions