Rethinking Again the Value of Network Pruning -- A Dynamical Isometry Perspective

2021-09-29Unverified0· sign in to hype

Huan Wang, Can Qin, Yue Bai, Yun Fu

Unverified — Be the first to reproduce this paper.

Abstract

Several recent works questioned the value of inheriting weight in structured neural network pruning because they empirically found training from scratch can match or even outperform finetuning a pruned model. In this paper, we present evidences that this argument is actually inaccurate because of using improperly small finetuning learning rates. With larger learning rates, our results consistently suggest pruning outperforms training from scratch on multiple networks (ResNets, VGG11) and datasets (MNIST, CIFAR10, ImageNet) over most pruning ratios. To deeply understand why finetuning learning rate holds such a critical role, we examine the theoretical reason behind through the lens of dynamical isometry, a nice property of networks that can make the gradient signals preserve norm during propagation. Our results suggest that weight removal in pruning breaks dynamical isometry, which fundamentally answers for the performance gap between a large finetuning LR and~a small one. Therefore, it is necessary to recover the dynamical isometry before finetuning. In this regard, we also present a regularization-based technique to do so, which is rather simple-to-implement yet effective in dynamical isometry recovery on modern residual convolutional neural networks.

Tasks

Network Pruning

Rethinking Again the Value of Network Pruning -- A Dynamical Isometry Perspective

Abstract

Tasks

Reproductions