Referring Expression Segmentation

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 145 papers

Title	Date	Tasks	Status	Hype
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy	Jul 2, 2025	Data AugmentationGeneralized Referring Expression Segmentation	CodeCode Available	1
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval	Jun 28, 2025	Cross-Modal RetrievalImage Captioning	—Unverified	0
Refer to Anything with Vision-Language Prompts	Jun 5, 2025	BenchmarkingGeneralized Referring Expression Segmentation	—Unverified	0
RemoteSAM: Towards Segment Anything for Earth Observation	May 23, 2025	AttributeEarth Observation	CodeCode Available	3
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning	May 17, 2025	2D Object DetectionObject Counting	CodeCode Available	4
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation	May 3, 2025	AttributeImage Segmentation	—Unverified	0
3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation	Apr 17, 2025	Referring ExpressionReferring Expression Segmentation	—Unverified	0
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities	Apr 2, 2025	DescriptiveLarge Language Model	CodeCode Available	0
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding	Mar 13, 2025	DiversityLanguage Modeling	CodeCode Available	2
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories	Mar 11, 2025	Decision MakingInteractive Segmentation	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 15Next →

All datasets RefCoCo val RefCOCO testA Refer-YouTube-VOS (2021 public validation)RefCOCO+ test B A2D Sentences RefCOCOg-val J-HMDB DAVIS 2017 (val)RefCOCOg-test RefCOCO testB PhraseCut RefCOCO

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	HINet	IoU overall	0.65	—	Unverified
2	ClawCraneNet	IoU overall	0.64	—	Unverified
3	CMSA+CFSA	IoU overall	0.63	—	Unverified
4	RefVOS	IoU overall	0.61	—	Unverified
5	SgMg (Video-Swin-B)	AP	0.45	—	Unverified
6	SOC (Video-Swin-B)	AP	0.45	—	Unverified
7	VLIDE	AP	0.44	—	Unverified
8	SOC (Video-Swin-T)	AP	0.4	—	Unverified
9	MTTR (w=10)	AP	0.39	—	Unverified
10	MTTR (w=8)	AP	0.37	—	Unverified