Polarized Self-Attention: Towards High-quality Pixel-wise Regression

2021-07-02arXiv preprint 2021Code Available1· sign in to hype

Huajun Liu, Fuqiang Liu, Xinyi Fan, Dong Huang

Code Available — Be the first to reproduce this paper.

Code

github.com/DeLightCMU/PSA
OfficialIn paperpytorch★ 255
github.com/sithu31296/pose-estimation
pytorch★ 54
github.com/mindspore-courses/External-Attention-MindSpore/blob/main/model/attention/PolarizedSelfAttention.py
mindspore★ 0
github.com/xmu-xiaoma666/External-Attention-pytorch/blob/master/attention/PolarizedSelfAttention.py
pytorch★ 0

Abstract

Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks. These regression problems are very challenging particularly because they require, at low computation overheads, modeling long-range dependencies on high-resolution inputs/outputs to estimate the highly nonlinear pixel-wise semantics. While attention mechanisms in Deep Convolutional Neural Networks(DCNNs) has become popular for boosting long-range dependencies, element-specific attention, such as Nonlocal blocks, is highly complex and noise-sensitive to learn, and most of simplified attention hybrids try to reach the best compromise among multiple types of tasks. In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions. (2) Enhancement: composing non-linearity that directly fits the output distribution of typical fine-grained regression, such as the 2D Gaussian distribution (keypoint heatmaps), or the 2D Binormial distribution (binary segmentation masks). PSA appears to have exhausted the representation capacity within its channel-only and spatial-only branches, such that there is only marginal metric differences between its sequential and parallel layouts. Experimental results show that PSA boosts standard baselines by 2-4 points, and boosts state-of-the-arts by 1-2 points on 2D pose estimation and semantic segmentation benchmarks.

Tasks

2D Pose Estimation Keypoint Detection Pose Estimation regression Segmentation Semantic Segmentation Vocal Bursts Intensity Prediction

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
COCO test-dev	UDP-Pose-PSA(384x288)	AP	79.5	—	Unverified
COCO test-dev	UDP-Pose-PSA(256x192)	AP	78.9	—	Unverified

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Code

Abstract

Tasks

Benchmark Results

Reproductions