Convergence Rate for the Last Iterate of Stochastic Gradient Descent Schemes

2026-03-09Unverified0· sign in to hype

Marcel Hudiani

Unverified — Be the first to reproduce this paper.

Abstract

We study the convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function F is globally convex or non-convex whose gradient is γ-Hölder. Using only discrete Gronwall's inequality without Robbins-Siegmund theorem, we recover results for both SGD and SHB: _s t \| F(w_s)\|^2 = o(t^p-1) for non-convex objectives and F(w_τ t) - F_* = o(t^2γ/(1+γ) (p-1,-2p+1)-ε) for β (0, 1), τ:= \ t > 0 : F(w_t) = F_*\, and _s t F(w_s) - F_* = o(t^p-1) for convex objectives F whose minimum is F_*. In addition, we proved that SHB with constant momentum parameter β (0, 1) attains a convergence rate of F(w_t) - F_* = O(t^(p-1,-2p+1) ^2 tδ) with probability at least 1-δ when F is convex and γ= 1 and step size α_t = Θ(t^-p) with p (12, 1).

Convergence Rate for the Last Iterate of Stochastic Gradient Descent Schemes

Abstract

Reproductions