How Gradient Descent Separates Data with Neural Collapse: A Layer-Peeled Perspective

2021-05-21NeurIPS 2021Unverified0· sign in to hype

Wenlong Ji, Yiping Lu, Yiliang Zhang, Zhun Deng, Weijie J Su

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we derive a landscape analysis to the surrogate model to study the inductive bias of the neural features and parameters from neural networks with cross-entropy. We show that once the training cross-entropy loss decreases below a certain threshold, the features and classifiers in the last layer of the neural network will converge to a certain geometry structure, which is known as neural collapsepapyan2020prevalence,fang2021layer, i.e. cross-example within-class variability of last-layer feature collapses to zero and the class-means converge to a Simplex Equiangular Tight Frame (ETF). We illustrate that the cross-entropy loss enjoys a benign global landscape where all the critical points are strict saddles whose Hessian exhibit negative curvature directions except the only global minimizers which exhibit neural collapse phenomenon.

Tasks

Inductive Bias

How Gradient Descent Separates Data with Neural Collapse: A Layer-Peeled Perspective

Abstract

Tasks

Reproductions