ResT: An Efficient Transformer for Visual Recognition

2021-05-28NeurIPS 2021Code Available1· sign in to hype

Qinglong Zhang, YuBin Yang

Code Available — Be the first to reproduce this paper.

Code

github.com/wofmanaf/ResT
OfficialIn paperpytorch★ 292
github.com/mindspore-courses/External-Attention-MindSpore/blob/main/model/attention/EMSA.py
mindspore★ 0
github.com/BR-IDL/PaddleViT/tree/develop/image_classification/ResT
paddle★ 0

Abstract

This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition. Unlike existing Transformer methods, which employ standard Transformer blocks to tackle raw images with a fixed resolution, our ResT have several advantages: (1) A memory-efficient multi-head self-attention is built, which compresses the memory by a simple depth-wise convolution, and projects the interaction across the attention-heads dimension while keeping the diversity ability of multi-heads; (2) Position encoding is constructed as spatial attention, which is more flexible and can tackle with input images of arbitrary size without interpolation or fine-tune; (3) Instead of the straightforward tokenization at the beginning of each stage, we design the patch embedding as a stack of overlapping convolution operation with stride on the 2D-reshaped token map. We comprehensively validate ResT on image classification and downstream tasks. Experimental results show that the proposed ResT can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of ResT as strong backbones. The code and models will be made publicly available at https://github.com/wofmanaf/ResT.

Tasks

Diversity image-classification Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet	ResT-Large	Top 1 Accuracy	83.6	—	Unverified
ImageNet	ResT-Small	Top 1 Accuracy	79.6	—	Unverified

ResT: An Efficient Transformer for Visual Recognition

Code

Abstract

Tasks

Benchmark Results

Reproductions