AlphaNet: Improved Training of Supernets with Alpha-Divergence

2021-02-16Code Available1· sign in to hype

Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, Vikas Chandra

Code Available — Be the first to reproduce this paper.

Code

github.com/facebookresearch/AlphaNet
OfficialIn paperpytorch★ 99
github.com/facebookresearch/AttentiveNAS
In paperpytorch★ 0

Abstract

Weight-sharing neural architecture search (NAS) is an effective technique for automating efficient neural architecture design. Weight-sharing NAS builds a supernet that assembles all the architectures as its sub-networks and jointly trains the supernet with the sub-networks. The success of weight-sharing NAS heavily relies on distilling the knowledge of the supernet to the sub-networks. However, we find that the widely used distillation divergence, i.e., KL divergence, may lead to student sub-networks that over-estimate or under-estimate the uncertainty of the teacher supernet, leading to inferior performance of the sub-networks. In this work, we propose to improve the supernet training with a more generalized alpha-divergence. By adaptively selecting the alpha-divergence, we simultaneously prevent the over-estimation or under-estimation of the uncertainty of the teacher model. We apply the proposed alpha-divergence based supernets training to both slimmable neural networks and weight-sharing NAS, and demonstrate significant improvements. Specifically, our discovered model family, AlphaNet, outperforms prior-art models on a wide range of FLOPs regimes, including BigNAS, Once-for-All networks, and AttentiveNAS. We achieve ImageNet top-1 accuracy of 80.0% with only 444M FLOPs. Our code and pretrained models are available at https://github.com/facebookresearch/AlphaNet.

Tasks

Image Classification Neural Architecture Search

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet	AlphaNet-A6	Top 1 Accuracy	80.8	—	Unverified
ImageNet	AlphaNet-A5	Top 1 Accuracy	80.3	—	Unverified
ImageNet	AlphaNet-A4	Top 1 Accuracy	80	—	Unverified
ImageNet	AlphaNet-A3	Top 1 Accuracy	79.4	—	Unverified
ImageNet	AlphaNet-A2	Top 1 Accuracy	79.1	—	Unverified
ImageNet	AlphaNet-A1	Top 1 Accuracy	78.9	—	Unverified
ImageNet	AlphaNet-A0	Top 1 Accuracy	77.8	—	Unverified

AlphaNet: Improved Training of Supernets with Alpha-Divergence

Code

Abstract

Tasks

Benchmark Results

Reproductions