SOTAVerified

Referring Expression Segmentation

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

Papers

Showing 101145 of 145 papers

TitleStatusHype
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation0
Local-Global Context Aware Transformer for Language-Guided Video SegmentationCode1
Language as Queries for Referring Video Object SegmentationCode2
Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation0
Image Segmentation Using Text and Image PromptsCode1
LAVT: Language-Aware Vision Transformer for Referring Image SegmentationCode1
CRIS: CLIP-Driven Referring Image SegmentationCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
Hierarchical interaction network for video object segmentation from referring expressions0
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation0
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual ConceptsCode1
Vision-Language Transformer and Query Generation for Referring SegmentationCode1
SynthRef: Generation of Synthetic Referring Expressions for Object SegmentationCode1
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
Cross-Modal Progressive Comprehension for Referring SegmentationCode1
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation0
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
Comprehensive Multi-Modal Interactions for Referring Image SegmentationCode0
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation0
OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene GroundingCode1
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network0
Actor and Action Modular Network for Text-based Video Segmentation0
RefVOS: A Closer Look at Referring Expressions for Video Object SegmentationCode1
Referring Image Segmentation via Cross-Modal Progressive ComprehensionCode1
PhraseCut: Language-based Image Segmentation in the WildCode1
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale BenchmarkCode1
Polar Relative Positional Encoding for Video-Language Segmentation0
Visual-Textual Capsule Routing for Text-Based Video Segmentation0
Bi-Directional Relationship Inferring Network for Referring Image Segmentation0
Referring Image Segmentation by Generative Adversarial Learning0
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries0
Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional FiltersCode0
Multi-task Collaborative Network for Joint Referring Expression Comprehension and SegmentationCode1
Recurrent Instance Segmentation using Sequences of Referring Expressions0
Referring Expression Object Segmentation with Caption-Aware ConsistencyCode0
See-Through-Text Grouping for Referring Image Segmentation0
Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language QueryCode0
Cross-Modal Self-Attention Network for Referring Image SegmentationCode0
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring ExpressionsCode0
Referring Image Segmentation via Recurrent Refinement NetworksCode0
Video Object Segmentation with Language Referring Expressions0
Actor and Action Video Segmentation from a SentenceCode1
MAttNet: Modular Attention Network for Referring Expression ComprehensionCode0
Tracking by Natural Language Specification0
Segmentation from Natural Language ExpressionsCode0
Show:102550
← PrevPage 3 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DeRIS-LOverall IoU85.41Unverified
2HyperSegOverall IoU84.8Unverified
3PSALMOverall IoU83.6Unverified
4MLCD-Seg-7BOverall IoU83.6Unverified
5HIPIEOverall IoU82.8Unverified
6EVF-SAMOverall IoU82.4Unverified
7UNINEXT-HOverall IoU82.19Unverified
8UniLSeg-100Overall IoU81.74Unverified
9DETRISOverall IoU81Unverified
10C3VGOverall IoU80.89Unverified
#ModelMetricClaimedVerifiedStatus
1DeRIS-LOverall IoU86.49Unverified
2HyperSegOverall IoU85.7Unverified
3MLCD-Seg-7BOverall IoU85.3Unverified
4EVF-SAMOverall IoU84.2Unverified
5HyperSegOverall IoU83.5Unverified
6C3VGOverall IoU83.18Unverified
7MLCD-Seg-7BOverall IoU82.9Unverified
8DeRIS-LOverall IoU82.34Unverified
9DETRISOverall IoU81.9Unverified
10MaskRIS (Swin-B, combined DB)Overall IoU80.64Unverified
#ModelMetricClaimedVerifiedStatus
1MPG-SAM 2J&F73.9Unverified
2VRS-HQ (Chat-UniVi-13B)J&F71Unverified
3GLEE-ProJ&F70.6Unverified
4UNINEXT-HJ&F70.1Unverified
5ReferDINO (Swin-B)J&F69.3Unverified
6MUTRJ&F68.4Unverified
7VLP (VLMo-L)J&F67.6Unverified
8UniRef-L (Swin-L)J&F67.4Unverified
9HTR (Pre-training)J&F67.1Unverified
10DsHmp (Video-Swin-Base)J&F67.1Unverified
#ModelMetricClaimedVerifiedStatus
1DeRIS-LMean IoU78.59Unverified
2MLCD-Seg-7BOverall IoU75.6Unverified
3HyperSegOverall IoU75.2Unverified
4EVF-SAMOverall IoU71.9Unverified
5DETRISOverall IoU70.2Unverified
6C3VGOverall IoU68.95Unverified
7UniLSeg-100Overall IoU68.15Unverified
8UniLSeg-20Overall IoU66.99Unverified
9UNINEXT-HOverall IoU66.22Unverified
10GROUNDHOGOverall IoU64.9Unverified
#ModelMetricClaimedVerifiedStatus
1HINetIoU overall0.68Unverified
2RefVOSIoU overall0.67Unverified
3ClawCraneNetIoU overall0.64Unverified
4CMSA+CFSAIoU overall0.62Unverified
5RefVOSIoU overall0.6Unverified
6SgMg (Video-Swin-B)AP0.59Unverified
7SOC (Video-Swin-B)AP0.57Unverified
8ReferFormer (Video-Swin-B)AP0.55Unverified
9SOC (Video-Swin-T)AP0.5Unverified
10MANETAP0.47Unverified