SOTAVerified

Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Showing 125 of 74 papers

TitleStatusHype
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and VideosCode5
4th PVUW MeViS 3rd Place Report: Sa2VACode5
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video SegmentationCode5
LISA: Reasoning Segmentation via Large Language ModelCode4
VISA: Reasoning Video Object Segmentation via Large Language ModelsCode3
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
UniVS: Unified and Universal Video Segmentation with Prompts as QueriesCode3
General Object Foundation Model for Images and Videos at ScaleCode3
Tracking Anything with Decoupled Video SegmentationCode3
Language as Queries for Referring Video Object SegmentationCode2
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
VLT: Vision-Language Transformer and Query Generation for Referring SegmentationCode2
HyperSeg: Towards Universal Visual Segmentation with Large Language ModelCode2
UniRef++: Segment Every Reference Object in Spatial and Temporal SpacesCode2
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video SegmentationCode2
MeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsCode2
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object SegmentationCode2
The Devil is in Temporal Token: High Quality Video Reasoning SegmentationCode2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object SegmentationCode1
ActionVOS: Actions as Prompts for Video Object SegmentationCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object SegmentationCode1
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.