A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

2019-10-01arXiv 2020Code Available0· sign in to hype

Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/google-research/task_adaptation
OfficialIn papertf★ 0

Abstract

Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, reconstruction error). We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples. With VTAB, we conduct a large-scale study of many popular publicly-available representation learning algorithms. We carefully control confounders such as architecture and tuning budget. We address questions like: How effective are ImageNet representations beyond standard natural datasets? How do representations trained via generative and discriminative models compare? To what extent can self-supervision replace labels? And, how close are we to general visual representations?

Tasks

Diversity Image Classification Representation Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
VTAB-1k	S4L-Exemplar-ResNet50-LargeHyperSweep	Top-1 Accuracy	72.7	—	Unverified
VTAB-1k	S4L-Rotation-ResNet50-LargeHyperSweep	Top-1 Accuracy	71.5	—	Unverified
VTAB-1k	ImageNet-ResNet50-LargeHyperSweep	Top-1 Accuracy	71.2	—	Unverified
VTAB-1k	S4L-Rotation-ResNet50	Top-1 Accuracy	67.5	—	Unverified
VTAB-1k	S4L-Exemplar-ResNet50	Top-1 Accuracy	67	—	Unverified
VTAB-1k	ImageNet-ResNet50	Top-1 Accuracy	65.6	—	Unverified
VTAB-1k	S4L-10%-Rotation-ResNet50	Top-1 Accuracy	64.8	—	Unverified
VTAB-1k	S4L-10%-Exemplar-ResNet50	Top-1 Accuracy	63.9	—	Unverified
VTAB-1k	ImageNet-10%-ResNet50	Top-1 Accuracy	61.6	—	Unverified
VTAB-1k	SelfSup-Rotation-ResNet50	Top-1 Accuracy	59.5	—	Unverified
VTAB-1k	ResNet50-LargeHyperSweep	Top-1 Accuracy	59.2	—	Unverified
VTAB-1k	BigBiGAN-ResNet50	Top-1 Accuracy	59.1	—	Unverified
VTAB-1k	SelfSup-Exemplar-ResNet50	Top-1 Accuracy	57.5	—	Unverified
VTAB-1k	SelfSup-Jigsaw-ResNet50	Top-1 Accuracy	51.1	—	Unverified
VTAB-1k	SelfSup-RelativePatchLoc-ResNet50	Top-1 Accuracy	50.8	—	Unverified
VTAB-1k	Unconditional-BigGAN-ResNet50	Top-1 Accuracy	44	—	Unverified
VTAB-1k	ResNet50	Top-1 Accuracy	42.1	—	Unverified
VTAB-1k	VAE	Top-1 Accuracy	37.5	—	Unverified
VTAB-1k	WAE-MMD	Top-1 Accuracy	37.3	—	Unverified
VTAB-1k	Conditional-BigGAN	Top-1 Accuracy	35.3	—	Unverified
VTAB-1k	WAE-GAN	Top-1 Accuracy	32	—	Unverified
VTAB-1k	WAE-UKL	Top-1 Accuracy	31	—	Unverified

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

Code

Abstract

Tasks

Benchmark Results

Reproductions