Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

2025-05-25Unverified0· sign in to hype

Itamar Harel, Yonathan Wolanowsky, Gal Vardi, Nathan Srebro, Daniel Soudry

Unverified — Be the first to reproduce this paper.

Abstract

We analyze the generalization gap (gap between the training and test errors) when training a potentially over-parametrized model using a Markovian stochastic training algorithm, initialized from some distribution _0 p_0. We focus on Langevin dynamics with a positive temperature ^-1, i.e. gradient descent on a training loss L with infinitesimal step size, perturbed with ^-1-variances Gaussian noise, and lightly regularized or bounded. There, we bound the generalization gap, at any time during training, by (E L (_0) + (1/))/N with probability 1- over the dataset, where N is the sample size, and E L (_0) =O(1) with standard initialization scaling. In contrast to previous guarantees, we have no dependence on either training time or reliance on mixing, nor a dependence on dimensionality, gradient norms, or any other properties of the loss or model. This guarantee follows from a general analysis of any Markov process-based training that has a Gibbs-style stationary distribution. The proof is surprisingly simple, once we observe that the marginal distribution divergence from initialization remains bounded, as implied by a generalized second law of thermodynamics.

Tasks

All

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

Abstract

Tasks

Reproductions