Polarized Self-Attention: Towards High-quality Pixel-wise Regression
Huajun Liu, Fuqiang Liu, Xinyi Fan, Dong Huang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/DeLightCMU/PSAOfficialIn paperpytorch★ 255
- github.com/PaddlePaddle/PaddleSegpaddle★ 9,319
- github.com/sithu31296/semantic-segmentationpytorch★ 939
- github.com/sithu31296/pose-estimationpytorch★ 54
- github.com/mindspore-courses/External-Attention-MindSpore/blob/main/model/attention/PolarizedSelfAttention.pymindspore★ 0
- github.com/xmu-xiaoma666/External-Attention-pytorch/blob/master/attention/PolarizedSelfAttention.pypytorch★ 0
Abstract
Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks. These regression problems are very challenging particularly because they require, at low computation overheads, modeling long-range dependencies on high-resolution inputs/outputs to estimate the highly nonlinear pixel-wise semantics. While attention mechanisms in Deep Convolutional Neural Networks(DCNNs) has become popular for boosting long-range dependencies, element-specific attention, such as Nonlocal blocks, is highly complex and noise-sensitive to learn, and most of simplified attention hybrids try to reach the best compromise among multiple types of tasks. In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions. (2) Enhancement: composing non-linearity that directly fits the output distribution of typical fine-grained regression, such as the 2D Gaussian distribution (keypoint heatmaps), or the 2D Binormial distribution (binary segmentation masks). PSA appears to have exhausted the representation capacity within its channel-only and spatial-only branches, such that there is only marginal metric differences between its sequential and parallel layouts. Experimental results show that PSA boosts standard baselines by 2-4 points, and boosts state-of-the-arts by 1-2 points on 2D pose estimation and semantic segmentation benchmarks.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| COCO test-dev | UDP-Pose-PSA(384x288) | AP | 79.5 | — | Unverified |
| COCO test-dev | UDP-Pose-PSA(256x192) | AP | 78.9 | — | Unverified |