Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training

2020-11-30Code Available1· sign in to hype

Shuai Zhao, Liguang Zhou, Wenxiao Wang, Deng Cai, Tin Lun Lam, Yangsheng Xu

Code Available — Be the first to reproduce this paper.

Code

github.com/freeformrobotics/divide-and-co-training
OfficialIn paperpytorch★ 107
github.com/mzhaoshuai/Divide-and-Co-training
OfficialIn paperpytorch★ 107

Abstract

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue that increasing the number of networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely increasing the width. To prove it, one large network is divided into several small ones regarding its parameters and regularization components. Each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to increase their diversity. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs, , achieving better accuracy-efficiency trade-offs. Small networks can also achieve faster inference speed than the large one by concurrent running. All of the above shows that the number of networks is a new dimension of model scaling. We validate our argument with 8 different neural architectures on common benchmarks through extensive experiments. The code is available at https://github.com/FreeformRobotics/Divide-and-Co-training.

Tasks

Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIFAR-10	PyramidNet-272, S=4	Percentage correct	98.71	—	Unverified
CIFAR-10	WRN-40-10, S=4	Percentage correct	98.38	—	Unverified
CIFAR-10	WRN-28-10, S=4	Percentage correct	98.32	—	Unverified
CIFAR-10	Shake-Shake 26 2x96d, S=4	Percentage correct	98.31	—	Unverified
CIFAR-100	PyramidNet-272, S=4	Percentage correct	89.46	—	Unverified
CIFAR-100	DenseNet-BC-190, S=4	Percentage correct	87.44	—	Unverified
CIFAR-100	WRN-40-10, S=4	Percentage correct	86.9	—	Unverified
CIFAR-100	WRN-28-10, S=4	Percentage correct	85.74	—	Unverified
ImageNet	SE-ResNeXt-101, 64x4d, S=2(320px)	Top 1 Accuracy	83.6	—	Unverified
ImageNet	SE-ResNeXt-101, 64x4d, S=2(416px)	Top 1 Accuracy	83.34	—	Unverified
ImageNet	ResNeXt-101, 64x4d, S=2(224px)	Top 1 Accuracy	82.13	—	Unverified

Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training

Code

Abstract

Tasks

Benchmark Results

Reproductions