SOTAVerified

MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

2024-11-28Code Available1· sign in to hype

Minhyun Lee, Seungho Lee, Song Park, Dongyoon Han, Byeongho Heo, Hyunjung Shim

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Referring Image Segmentation (RIS) is an advanced vision-language task that involves identifying and segmenting objects within an image as described by free-form text descriptions. While previous studies focused on aligning visual and language features, exploring training techniques, such as data augmentation, remains underexplored. In this work, we explore effective data augmentation for RIS and propose a novel training framework called Masked Referring Image Segmentation (MaskRIS). We observe that the conventional image augmentations fall short of RIS, leading to performance degradation, while simple random masking significantly enhances the performance of RIS. MaskRIS uses both image and text masking, followed by Distortion-aware Contextual Learning (DCL) to fully exploit the benefits of the masking strategy. This approach can improve the model's robustness to occlusions, incomplete information, and various linguistic complexities, resulting in a significant performance improvement. Experiments demonstrate that MaskRIS can easily be applied to various RIS models, outperforming existing methods in both fully supervised and weakly supervised settings. Finally, MaskRIS achieves new state-of-the-art performance on RefCOCO, RefCOCO+, and RefCOCOg datasets. Code is available at https://github.com/naver-ai/maskris.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
RefCOCOg-testMaskRIS (Swin-B)Overall IoU66.5Unverified
RefCOCOg-testMaskRIS (Swin-B, combined DB)Overall IoU71.09Unverified
RefCOCOg-valMaskRIS (Swin-B)Overall IoU65.55Unverified
RefCOCOg-valMaskRIS (Swin-B, combined DB)Overall IoU69.12Unverified
RefCOCO testAMaskRIS (Swin-B)Overall IoU74.46Unverified
RefCOCO testAMaskRIS (Swin-B, combined DB)Overall IoU75.15Unverified
RefCOCO testAMaskRIS (Swin-B, combined DB)Overall IoU80.64Unverified
RefCOCO testAMaskRIS (Swin-B)Overall IoU78.96Unverified
RefCOCO testBMaskRIS (Swin-B, combined DB)Overall IoU75.1Unverified
RefCOCO testBMaskRIS (Swin-B)Overall IoU73.96Unverified
RefCOCO+ test BMaskRIS (Swin-B, combined DB)Overall IoU62.83Unverified
RefCOCO+ test BMaskRIS (Swin-B)Overall IoU59.39Unverified
RefCoCo valMaskRIS (Swin-B)Overall IoU76.49Unverified
RefCoCo valMaskRIS (Swin-B)Overall IoU67.54Unverified
RefCoCo valMaskRIS (Swin-B, combined DB)Overall IoU70.26Unverified
RefCoCo valMaskRIS (Swin-B, combined DB)Overall IoU78.71Unverified

Reproductions