SOTAVerified

Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

2019-09-24ECCV 2020Code Available2· sign in to hype

Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy. Our motivation is that the label of a pixel is the category of the object that the pixel belongs to. We present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, % the representation similarity we compute the relation between each pixel and each object region and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the pixel. We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff. Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff. Our submission "HRNet + OCR + SegFix" achieves 1-st place on the Cityscapes leaderboard by the time of submission. Code is available at: https://git.io/openseg and https://git.io/HRNet.OCR. We rephrase the object-contextual representation scheme using the Transformer encoder-decoder framework. The details are presented in~Section3.3.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
ADE20KHRNetV2 + OCR + RMI (PaddleClas pretrained)Validation mIoU47.98Unverified
ADE20KOCR (ResNet-101)Validation mIoU45.28Unverified
ADE20KOCR(HRNetV2-W48)Validation mIoU45.66Unverified
ADE20K valOCR (HRNetV2-W48)mIoU45.66Unverified
ADE20K valHRNetV2 + OCR + RMI (PaddleClas pretrained)mIoU47.98Unverified
ADE20K valOCR (ResNet-101)mIoU45.28Unverified
BDD100K valOCRNetmIoU60.1Unverified
Cityscapes testOCR (ResNet-101, coarse)Mean IoU (class)82.4Unverified
Cityscapes testHRNetV2 + OCR +Mean IoU (class)84.5Unverified
Cityscapes testOCR (HRNetV2-W48, coarse)Mean IoU (class)83Unverified
Cityscapes testOCR (ResNet-101)Mean IoU (class)81.8Unverified
Cityscapes testHRNetV2 + OCR (w/ ASP)Mean IoU (class)83.7Unverified
Cityscapes valOCR (ResNet-101-FCN)mIoU80.6Unverified
Cityscapes valHRNetV2 + OCR + RMI (PaddleClas pretrained)mIoU83.6Unverified
COCO-Stuff testOCR (ResNet-101)mIoU39.5Unverified
COCO-Stuff testHRNetV2 + OCR + RMI (PaddleClas pretrained)mIoU45.2Unverified
COCO-Stuff testOCR (HRNetV2-W48)mIoU40.5Unverified
LIP valOCR (HRNetV2-W48)mIoU56.65Unverified
LIP valOCR (ResNet-101)mIoU55.6Unverified
LIP valHRNetV2 + OCR + RMI (PaddleClas pretrained)mIoU58.2Unverified
PASCAL ContextOCR (ResNet-101)mIoU54.8Unverified
PASCAL ContextOCR (HRNetV2-W48)mIoU56.2Unverified
PASCAL ContextHRNetV2 + OCR + RMI (PaddleClas pretrained)mIoU59.6Unverified
PASCAL VOC 2012 testOCR (HRNetV2-W48)Mean IoU84.5Unverified
PASCAL VOC 2012 testOCR (ResNet-101)Mean IoU84.3Unverified

Reproductions