SOTAVerified

Referring Expression Segmentation

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

Papers

Showing 110 of 145 papers

TitleStatusHype
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback SynergyCode1
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval0
Refer to Anything with Vision-Language Prompts0
RemoteSAM: Towards Segment Anything for Earth ObservationCode3
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement LearningCode4
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation0
3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation0
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target GranularitiesCode0
GroundingSuite: Measuring Complex Multi-Granular Pixel GroundingCode2
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesCode2
Show:102550
← PrevPage 1 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UNINEXT-HJ&F 1st frame72.5Unverified
2HyperSegJ&F 1st frame71.2Unverified
3DEVA (ReferFormer)J&F 1st frame66.3Unverified
4HTRJ&F 1st frame65.6Unverified
5VATEXJ&F score65.4Unverified
6SgMgJ&F 1st frame63.3Unverified
7SafaRi-BJ&F 1st frame61.3Unverified
8ReferFormerJ&F 1st frame61.1Unverified
9PolyFormer-BJ&F 1st frame60.9Unverified
10UniVS(Swin-L)J&F Full video59.4Unverified