MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation
Minhyun Lee, Seungho Lee, Song Park, Dongyoon Han, Byeongho Heo, Hyunjung Shim
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/naver-ai/maskrisOfficialIn paperpytorch★ 18
Abstract
Referring Image Segmentation (RIS) is an advanced vision-language task that involves identifying and segmenting objects within an image as described by free-form text descriptions. While previous studies focused on aligning visual and language features, exploring training techniques, such as data augmentation, remains underexplored. In this work, we explore effective data augmentation for RIS and propose a novel training framework called Masked Referring Image Segmentation (MaskRIS). We observe that the conventional image augmentations fall short of RIS, leading to performance degradation, while simple random masking significantly enhances the performance of RIS. MaskRIS uses both image and text masking, followed by Distortion-aware Contextual Learning (DCL) to fully exploit the benefits of the masking strategy. This approach can improve the model's robustness to occlusions, incomplete information, and various linguistic complexities, resulting in a significant performance improvement. Experiments demonstrate that MaskRIS can easily be applied to various RIS models, outperforming existing methods in both fully supervised and weakly supervised settings. Finally, MaskRIS achieves new state-of-the-art performance on RefCOCO, RefCOCO+, and RefCOCOg datasets. Code is available at https://github.com/naver-ai/maskris.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| RefCOCOg-test | MaskRIS (Swin-B) | Overall IoU | 66.5 | — | Unverified |
| RefCOCOg-test | MaskRIS (Swin-B, combined DB) | Overall IoU | 71.09 | — | Unverified |
| RefCOCOg-val | MaskRIS (Swin-B) | Overall IoU | 65.55 | — | Unverified |
| RefCOCOg-val | MaskRIS (Swin-B, combined DB) | Overall IoU | 69.12 | — | Unverified |
| RefCOCO testA | MaskRIS (Swin-B) | Overall IoU | 74.46 | — | Unverified |
| RefCOCO testA | MaskRIS (Swin-B, combined DB) | Overall IoU | 75.15 | — | Unverified |
| RefCOCO testA | MaskRIS (Swin-B, combined DB) | Overall IoU | 80.64 | — | Unverified |
| RefCOCO testA | MaskRIS (Swin-B) | Overall IoU | 78.96 | — | Unverified |
| RefCOCO testB | MaskRIS (Swin-B, combined DB) | Overall IoU | 75.1 | — | Unverified |
| RefCOCO testB | MaskRIS (Swin-B) | Overall IoU | 73.96 | — | Unverified |
| RefCOCO+ test B | MaskRIS (Swin-B, combined DB) | Overall IoU | 62.83 | — | Unverified |
| RefCOCO+ test B | MaskRIS (Swin-B) | Overall IoU | 59.39 | — | Unverified |
| RefCoCo val | MaskRIS (Swin-B) | Overall IoU | 76.49 | — | Unverified |
| RefCoCo val | MaskRIS (Swin-B) | Overall IoU | 67.54 | — | Unverified |
| RefCoCo val | MaskRIS (Swin-B, combined DB) | Overall IoU | 70.26 | — | Unverified |
| RefCoCo val | MaskRIS (Swin-B, combined DB) | Overall IoU | 78.71 | — | Unverified |