Understanding The Robustness in Vision Transformers

2022-04-26Code Available2· sign in to hype

Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez

Code Available — Be the first to reproduce this paper.

Code

github.com/nvlabs/fan
OfficialIn paperpytorch★ 481
github.com/NVlabs/STL
pytorch★ 35

Abstract

Recent studies show that Vision Transformers(ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations. We further propose a family of fully attentional networks (FANs) that strengthen this capability by incorporating an attentional channel processing design. We validate the design comprehensively on various hierarchical backbones. Our model achieves a state-of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters. We also demonstrate state-of-the-art accuracy and robustness in two downstream tasks: semantic segmentation and object detection. Code is available at: https://github.com/NVlabs/FAN.

Tasks

Domain Generalization Image Classification object-detection Object Detection Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet-A	FAN-Hybrid-L(IN-21K, 384)	Top-1 accuracy %	74.5	—	Unverified
ImageNet-C	FAN-L-Hybrid	mean Corruption Error (mCE)	43	—	Unverified
ImageNet-C	FAN-B-Hybrid (IN-22k)	mean Corruption Error (mCE)	41	—	Unverified
ImageNet-C	FAN-L-Hybrid (IN-22k)	mean Corruption Error (mCE)	35.8	—	Unverified
ImageNet-R	FAN-Hybrid-L(IN-21K, 384))	Top-1 Error Rate	28.9	—	Unverified

Understanding The Robustness in Vision Transformers

Code

Abstract

Tasks

Benchmark Results

Reproductions