Masked-attention Mask Transformer for Universal Image Segmentation

2021-12-02CVPR 2022Code Available2· sign in to hype

Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar

Code Available — Be the first to reproduce this paper.

Code

github.com/facebookresearch/Mask2Former
Officialpytorch★ 3,289
github.com/DdeGeus/Mask2Former-IBS
pytorch★ 7
github.com/nihalsid/mask2former
pytorch★ 4
github.com/MindSpore-scientific/code-7/tree/main/Mask2Former
none★ 0

Abstract

Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).

Tasks

2D Semantic Segmentation Image Segmentation Instance Segmentation Panoptic Segmentation Segmentation Semantic Segmentation Universal Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WildScenes	Mask2Former (ResNet-50)	mIoU	43.71	—	Unverified
WildScenes	Mask2Former (Swin-L)	mIoU	47.85	—	Unverified

Masked-attention Mask Transformer for Universal Image Segmentation

Code

Abstract

Tasks

Benchmark Results

Reproductions