SOTAVerified

Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Showing 125 of 74 papers

TitleStatusHype
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video SegmentationCode5
4th PVUW MeViS 3rd Place Report: Sa2VACode5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and VideosCode5
LISA: Reasoning Segmentation via Large Language ModelCode4
VISA: Reasoning Video Object Segmentation via Large Language ModelsCode3
Tracking Anything with Decoupled Video SegmentationCode3
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
General Object Foundation Model for Images and Videos at ScaleCode3
UniVS: Unified and Universal Video Segmentation with Prompts as QueriesCode3
VLT: Vision-Language Transformer and Query Generation for Referring SegmentationCode2
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
HyperSeg: Towards Universal Visual Segmentation with Large Language ModelCode2
The Devil is in Temporal Token: High Quality Video Reasoning SegmentationCode2
UniRef++: Segment Every Reference Object in Spatial and Temporal SpacesCode2
Language as Queries for Referring Video Object SegmentationCode2
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object SegmentationCode2
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video SegmentationCode2
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
MeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsCode2
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object SegmentationCode1
ActionVOS: Actions as Prompts for Video Object SegmentationCode1
Towards Robust Referring Video Object Segmentation with Cyclic Relational ConsensusCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.