Co-training 2^L Submodels for Visual Recognition

2022-12-09Unverified0· sign in to hype

Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob Verbeek, Hervé Jégou

Unverified — Be the first to reproduce this paper.

Abstract

We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed cosub, uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val.

Tasks

image-classification Image Classification Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet	ViT-H@224 (cosub)	Top 1 Accuracy	88	—	Unverified
ImageNet	ViT-L@224 (cosub)	Top 1 Accuracy	87.5	—	Unverified
ImageNet	Swin-L@224 (cosub)	Top 1 Accuracy	87.1	—	Unverified
ImageNet	ViT-B@224 (cosub)	Top 1 Accuracy	86.3	—	Unverified
ImageNet	Swin-B@224 (cosub)	Top 1 Accuracy	86.2	—	Unverified
ImageNet	ConvNeXt-B@224 (cosub)	Top 1 Accuracy	85.8	—	Unverified
ImageNet	PiT-B@224 (cosub)	Top 1 Accuracy	85.8	—	Unverified
ImageNet	ViT-M@224 (cosub)	Top 1 Accuracy	85	—	Unverified
ImageNet	RegnetY16GF@224 (cosub)	Top 1 Accuracy	84.2	—	Unverified
ImageNet	ViT-S@224 (cosub)	Top 1 Accuracy	83.1	—	Unverified

Co-training 2^L Submodels for Visual Recognition

Abstract

Tasks

Benchmark Results

Reproductions