SOTAVerified

Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Showing 150 of 74 papers

TitleStatusHype
4th PVUW MeViS 3rd Place Report: Sa2VACode5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and VideosCode5
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video SegmentationCode5
LISA: Reasoning Segmentation via Large Language ModelCode4
Tracking Anything with Decoupled Video SegmentationCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
UniVS: Unified and Universal Video Segmentation with Prompts as QueriesCode3
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationCode3
VISA: Reasoning Video Object Segmentation via Large Language ModelsCode3
General Object Foundation Model for Images and Videos at ScaleCode3
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object SegmentationCode2
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video SegmentationCode2
HyperSeg: Towards Universal Visual Segmentation with Large Language ModelCode2
Language as Queries for Referring Video Object SegmentationCode2
MeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsCode2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
The Devil is in Temporal Token: High Quality Video Reasoning SegmentationCode2
UniRef++: Segment Every Reference Object in Spatial and Temporal SpacesCode2
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
VLT: Vision-Language Transformer and Query Generation for Referring SegmentationCode2
Tracking with Human-Intent ReasoningCode1
ActionVOS: Actions as Prompts for Video Object SegmentationCode1
Multi-Attention Network for Compressed Video Referring Object SegmentationCode1
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationCode1
Referring Video Object Segmentation via Language-aligned Track SelectionCode1
1st Place Solution for 5th LSVOS Challenge: Referring Video Object SegmentationCode1
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object SegmentationCode1
Local-Global Context Aware Transformer for Language-Guided Video SegmentationCode1
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object SegmentationCode1
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale BenchmarkCode1
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video SegmentationCode1
Language-Bridged Spatial-Temporal Interaction for Referring Video Object SegmentationCode1
Temporally Consistent Referring Video Object Segmentation with Hybrid MemoryCode1
SOC: Semantic-Assisted Object Cluster for Referring Video Object SegmentationCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
Spectrum-guided Multi-granularity Referring Video Object SegmentationCode1
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object SegmentationCode1
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object SegmentationCode1
1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object SegmentationCode1
OnlineRefer: A Simple Online Baseline for Referring Video Object SegmentationCode1
Towards Robust Referring Video Object Segmentation with Cyclic Relational ConsensusCode1
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context UnderstandingCode0
Cross-Modal Self-Attention Network for Referring Image SegmentationCode0
Multi-Context Temporal Consistent Modeling for Referring Video Object SegmentationCode0
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited SamplesCode0
DTOS: Dynamic Time Object Sensing with Large Multimodal ModelCode0
Expression Prompt Collaboration Transformer for Universal Referring Video Object SegmentationCode0
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence MatchingCode0
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025Code0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.