SOTAVerified

Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Showing 150 of 74 papers

TitleStatusHype
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video SegmentationCode5
4th PVUW MeViS 3rd Place Report: Sa2VACode5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and VideosCode5
LISA: Reasoning Segmentation via Large Language ModelCode4
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationCode3
VISA: Reasoning Video Object Segmentation via Large Language ModelsCode3
UniVS: Unified and Universal Video Segmentation with Prompts as QueriesCode3
General Object Foundation Model for Images and Videos at ScaleCode3
Tracking Anything with Decoupled Video SegmentationCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video SegmentationCode2
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object SegmentationCode2
The Devil is in Temporal Token: High Quality Video Reasoning SegmentationCode2
HyperSeg: Towards Universal Visual Segmentation with Large Language ModelCode2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
UniRef++: Segment Every Reference Object in Spatial and Temporal SpacesCode2
MeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsCode2
VLT: Vision-Language Transformer and Query Generation for Referring SegmentationCode2
Language as Queries for Referring Video Object SegmentationCode2
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object SegmentationCode1
Referring Video Object Segmentation via Language-aligned Track SelectionCode1
ActionVOS: Actions as Prompts for Video Object SegmentationCode1
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video SegmentationCode1
Temporally Consistent Referring Video Object Segmentation with Hybrid MemoryCode1
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object SegmentationCode1
1st Place Solution for 5th LSVOS Challenge: Referring Video Object SegmentationCode1
Tracking with Human-Intent ReasoningCode1
Spectrum-guided Multi-granularity Referring Video Object SegmentationCode1
OnlineRefer: A Simple Online Baseline for Referring Video Object SegmentationCode1
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object SegmentationCode1
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object SegmentationCode1
SOC: Semantic-Assisted Object Cluster for Referring Video Object SegmentationCode1
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationCode1
1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object SegmentationCode1
Multi-Attention Network for Compressed Video Referring Object SegmentationCode1
Towards Robust Referring Video Object Segmentation with Cyclic Relational ConsensusCode1
Language-Bridged Spatial-Temporal Interaction for Referring Video Object SegmentationCode1
Local-Global Context Aware Transformer for Language-Guided Video SegmentationCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale BenchmarkCode1
InterRVOS: Interaction-aware Referring Video Object Segmentation0
Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation0
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence MatchingCode0
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025Code0
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations0
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context ModelingCode0
Multi-Context Temporal Consistent Modeling for Referring Video Object SegmentationCode0
DTOS: Dynamic Time Object Sensing with Large Multimodal ModelCode0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.