PVT v2: Improved Baselines with Pyramid Vision Transformer

2021-06-25Code Available1· sign in to hype

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Code Available — Be the first to reproduce this paper.

Code

github.com/whai362/PVT
OfficialIn paperpytorch★ 1,889
github.com/rwightman/pytorch-image-models
pytorch★ 36,538
github.com/open-mmlab/mmdetection
pytorch★ 32,525
github.com/open-mmlab/mmpose
pytorch★ 7,439
github.com/PaddlePaddle/PaddleClas
paddle★ 5,788
github.com/sithu31296/semantic-segmentation
pytorch★ 939
github.com/shinya7y/UniverseNet
pytorch★ 433
github.com/martinsbruveris/tensorflow-image-models
tf★ 291
github.com/xiaohu2015/pvt_detectron2
pytorch★ 29
github.com/Owais-Ansari/Unet3plus
pytorch★ 10

Abstract

Transformer recently has presented encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs, including (1) linear complexity attention layer, (2) overlapping patch embedding, and (3) convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linear and achieves significant improvements on fundamental vision tasks such as classification, detection, and segmentation. Notably, the proposed PVT v2 achieves comparable or better performances than recent works such as Swin Transformer. We hope this work will facilitate state-of-the-art Transformer researches in computer vision. Code is available at https://github.com/whai362/PVT.

Tasks

Image Classification Object Detection Panoptic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet	PVTv2-B3	Top 1 Accuracy	83.2	—	Unverified
ImageNet	PVTv2-B1	Top 1 Accuracy	78.7	—	Unverified
ImageNet	PVTv2-B0	Top 1 Accuracy	70.5	—	Unverified
ImageNet	PVTv2-B4	Top 1 Accuracy	83.8	—	Unverified
ImageNet	PVTv2-B2	Top 1 Accuracy	82	—	Unverified

PVT v2: Improved Baselines with Pyramid Vision Transformer

Code

Abstract

Tasks

Benchmark Results

Reproductions