Directional Bias Helps Stochastic Gradient Descent to Generalize in Nonparametric Model

2021-09-29Unverified0· sign in to hype

Yiling Luo, Xiaoming Huo, Yajun Mei

Unverified — Be the first to reproduce this paper.

Abstract

This paper studies the Stochastic Gradient Descent (SGD) algorithm in kernel regression. The main finding is that SGD with moderate and annealing step size converges in the direction of the eigenvector that corresponds to the largest eigenvalue of the gram matrix. On the contrary, the Gradient Descent (GD) with a moderate or small step size converges along the direction that corresponds to the smallest eigenvalue. For a general squared risk minimization problem, we show that directional bias towards a larger eigenvalue of the Hessian (which is the gram matrix in our case) results in an estimator that is closer to the ground truth. Adopting this result to kernel regression, the directional bias helps the SGD estimator generalize better. This result gives one way to explain how noise helps in generalization when learning with a nontrivial step size, which may be useful for promoting further understanding of stochastic algorithms in deep learning. The correctness of our theory is supported by simulations and experiments of Neural Network on the FashionMNIST dataset.

Tasks

regression

Directional Bias Helps Stochastic Gradient Descent to Generalize in Nonparametric Model

Abstract

Tasks

Reproductions