Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 74 papers

Title	Date	Tasks	Status	Hype	Score
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation	Apr 7, 2025	Inference OptimizationReferring Video Object Segmentation	CodeCode Available	5	5
4th PVUW MeViS 3rd Place Report: Sa2VA	Apr 1, 2025	Language ModelingLanguage Modelling	CodeCode Available	5	5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Jan 7, 2025	2kLanguage Modeling	CodeCode Available	5	5
LISA: Reasoning Segmentation via Large Language Model	Aug 1, 2023	Language ModelingLanguage Modelling	CodeCode Available	4	5
VISA: Reasoning Video Object Segmentation via Large Language Models	Jul 16, 2024	DecoderObject	CodeCode Available	3	5
Tracking Anything with Decoupled Video Segmentation	Sep 7, 2023	Open-Vocabulary Video SegmentationOpen-World Video Segmentation	CodeCode Available	3	5
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation	Nov 26, 2024	Natural Language UnderstandingReferring Video Object Segmentation	CodeCode Available	3	5
Universal Instance Perception as Object Discovery and Retrieval	Mar 12, 2023	Described Object DetectionGeneralized Referring Expression Comprehension	CodeCode Available	3	5
General Object Foundation Model for Images and Videos at Scale	Dec 14, 2023	Instance SegmentationLong-tail Video Object Segmentation	CodeCode Available	3	5
UniVS: Unified and Universal Video Segmentation with Prompts as Queries	Feb 28, 2024	DecoderReferring Expression Segmentation	CodeCode Available	3	5
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation	Oct 28, 2022	Referring Expression SegmentationReferring Video Object Segmentation	CodeCode Available	2	5
VideoMolmo: Spatio-Temporal Grounding Meets Pointing	Jun 5, 2025	Autonomous DrivingAutonomous Navigation	CodeCode Available	2	5
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions	Aug 16, 2023	Motion Expressions Guided Video SegmentationObject	CodeCode Available	2	5
HyperSeg: Towards Universal Visual Segmentation with Large Language Model	Nov 26, 2024	Language ModelingLarge Language Model	CodeCode Available	2	5
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation	Jan 15, 2025	Reasoning SegmentationReferring Expression Segmentation	CodeCode Available	2	5
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces	Dec 25, 2023	Image SegmentationObject	CodeCode Available	2	5
Language as Queries for Referring Video Object Segmentation	Jan 3, 2022	ObjectObject Tracking	CodeCode Available	2	5
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation	Mar 5, 2025	ObjectReferring Video Object Segmentation	CodeCode Available	2	5
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Apr 10, 2025	Contrastive LearningLanguage Modeling	CodeCode Available	2	5
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation	Apr 4, 2024	Contrastive LearningReferring Expression	CodeCode Available	2	5
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos	Sep 29, 2024	AllImage Segmentation	CodeCode Available	2	5
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation	Mar 18, 2024	Referring Video Object SegmentationSemantic Segmentation	CodeCode Available	1	5
ActionVOS: Actions as Prompts for Video Object Segmentation	Jul 10, 2024	ObjectReferring Video Object Segmentation	CodeCode Available	1	5
Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus	Jul 4, 2022	Referring Expression SegmentationReferring Video Object Segmentation	CodeCode Available	1	5
End-to-End Referring Video Object Segmentation with Multimodal Transformers	Nov 29, 2021	Inductive BiasInstance Segmentation	CodeCode Available	1	5

Show:10 25 50

← PrevPage 1 of 3Next →

No leaderboard results yet.