Deep Learning Optimization Theory - Trajectory Analysis of Gradient Descent
2022-01-17ICLR Track Blog 2022Unverified0· sign in to hype
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In recent years an obvious yet mysterious fact that stood across various experiments is the ability of gradient descent, a relatively simple first-order optimization method, to optimize an enormous number of parameters on highly non-convex loss functions. In some sense, this practical observation stands in contrast to classical statistical learning theory. This post will discuss the significant progress researchers are making in bridging this theory gap and demystifying gradient descent.