SOTAVerified

Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Showing 1120 of 74 papers

TitleStatusHype
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video SegmentationCode2
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object SegmentationCode2
The Devil is in Temporal Token: High Quality Video Reasoning SegmentationCode2
HyperSeg: Towards Universal Visual Segmentation with Large Language ModelCode2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
UniRef++: Segment Every Reference Object in Spatial and Temporal SpacesCode2
MeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsCode2
VLT: Vision-Language Transformer and Query Generation for Referring SegmentationCode2
Show:102550
← PrevPage 2 of 8Next →

No leaderboard results yet.