MLP-Mixer: An all-MLP Architecture for Vision

2021-05-04NeurIPS 2021Code Available1· sign in to hype

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/lucidrains/mlp-mixer-pytorch
pytorch★ 1,056
github.com/rishikksh20/MLP-Mixer-pytorch
pytorch★ 216
github.com/ashishpatel26/Vision-Transformer-Keras-Tensorflow-Pytorch-Examples
pytorch★ 111
github.com/bangoc123/mlp-mixer
tf★ 91
github.com/sayakpaul/MLP-Mixer-CIFAR10
none★ 59
github.com/omihub777/mlp-mixer-cifar
pytorch★ 37
github.com/jeonsworld/MLP-Mixer-Pytorch
pytorch★ 36
github.com/isaaccorley/mlp-mixer-pytorch
pytorch★ 31
github.com/jaketae/mlp-mixer
pytorch★ 30
github.com/leaderj1001/Bag-of-MLP
pytorch★ 20

Abstract

Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.

Tasks

All image-classification Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet	Mixer-H/14 (JFT-300M pre-train)	Top 1 Accuracy	87.94	—	Unverified
ImageNet	ViT-L/16 Dosovitskiy et al. (2021)	Top 1 Accuracy	85.3	—	Unverified
ImageNet	Mixer-B/16	Top 1 Accuracy	76.44	—	Unverified
ImageNet ReaL	Mixer-H/14- 448 (JFT-300M pre-train)	Accuracy	90.18	—	Unverified
ImageNet ReaL	Mixer-H/14 (JFT-300M pre-train)	Accuracy	87.86	—	Unverified
OmniBenchmark	MLP-Mixer	Average Top-1 Accuracy	32.2	—	Unverified

MLP-Mixer: An all-MLP Architecture for Vision

Code

Abstract

Tasks

Benchmark Results

Reproductions