SOTAVerified

Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Showing 150 of 74 papers

TitleStatusHype
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
InterRVOS: Interaction-aware Referring Video Object Segmentation0
Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation0
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence MatchingCode0
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video SegmentationCode2
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video SegmentationCode5
4th PVUW MeViS 3rd Place Report: Sa2VACode5
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025Code0
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object SegmentationCode2
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations0
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object SegmentationCode1
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context ModelingCode0
The Devil is in Temporal Token: High Quality Video Reasoning SegmentationCode2
Multi-Context Temporal Consistent Modeling for Referring Video Object SegmentationCode0
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and VideosCode5
DTOS: Dynamic Time Object Sensing with Large Multimodal ModelCode0
Semantic and Sequential Alignment for Referring Video Object Segmentation0
Decoupled Motion Expression Video Segmentation0
Referring Video Object Segmentation via Language-aligned Track SelectionCode1
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationCode3
HyperSeg: Towards Universal Visual Segmentation with Large Language ModelCode2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation0
The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation0
The Instance-centric Transformer for the RVOS Track of LSVOS Challenge: 3rd Place Solution0
UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track0
VISA: Reasoning Video Object Segmentation via Large Language ModelsCode3
ActionVOS: Actions as Prompts for Video Object SegmentationCode1
2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation0
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation0
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video SegmentationCode1
3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation0
Harnessing Vision-Language Pretrained Models with Temporal-Aware Adaptation for Referring Video Object Segmentation0
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context UnderstandingCode0
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
Temporally Consistent Referring Video Object Segmentation with Hybrid MemoryCode1
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object SegmentationCode1
UniVS: Unified and Universal Video Segmentation with Prompts as QueriesCode3
1st Place Solution for 5th LSVOS Challenge: Referring Video Object SegmentationCode1
Tracking with Human-Intent ReasoningCode1
UniRef++: Segment Every Reference Object in Spatial and Temporal SpacesCode2
General Object Foundation Model for Images and Videos at ScaleCode3
Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation0
Temporal Collection and Distribution for Referring Video Object Segmentation0
Tracking Anything with Decoupled Video SegmentationCode3
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited SamplesCode0
MeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsCode2
Expression Prompt Collaboration Transformer for Universal Referring Video Object SegmentationCode0
Learning Referring Video Object Segmentation from Weak Annotation0
LISA: Reasoning Segmentation via Large Language ModelCode4
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.