SOTAVerified

How Gradient Descent Separates Data with Neural Collapse: A Layer-Peeled Perspective

2021-05-21NeurIPS 2021Unverified0· sign in to hype

Wenlong Ji, Yiping Lu, Yiliang Zhang, Zhun Deng, Weijie J Su

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we derive a landscape analysis to the surrogate model to study the inductive bias of the neural features and parameters from neural networks with cross-entropy. We show that once the training cross-entropy loss decreases below a certain threshold, the features and classifiers in the last layer of the neural network will converge to a certain geometry structure, which is known as neural collapsepapyan2020prevalence,fang2021layer, i.e. cross-example within-class variability of last-layer feature collapses to zero and the class-means converge to a Simplex Equiangular Tight Frame (ETF). We illustrate that the cross-entropy loss enjoys a benign global landscape where all the critical points are strict saddles whose Hessian exhibit negative curvature directions except the only global minimizers which exhibit neural collapse phenomenon.

Tasks

Reproductions