SOTAVerified

Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Showing 2650 of 74 papers

TitleStatusHype
Referring Video Object Segmentation via Language-aligned Track SelectionCode1
1st Place Solution for 5th LSVOS Challenge: Referring Video Object SegmentationCode1
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object SegmentationCode1
Local-Global Context Aware Transformer for Language-Guided Video SegmentationCode1
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object SegmentationCode1
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale BenchmarkCode1
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video SegmentationCode1
Language-Bridged Spatial-Temporal Interaction for Referring Video Object SegmentationCode1
Temporally Consistent Referring Video Object Segmentation with Hybrid MemoryCode1
SOC: Semantic-Assisted Object Cluster for Referring Video Object SegmentationCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
Spectrum-guided Multi-granularity Referring Video Object SegmentationCode1
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object SegmentationCode1
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object SegmentationCode1
1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object SegmentationCode1
OnlineRefer: A Simple Online Baseline for Referring Video Object SegmentationCode1
Towards Robust Referring Video Object Segmentation with Cyclic Relational ConsensusCode1
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context UnderstandingCode0
Cross-Modal Self-Attention Network for Referring Image SegmentationCode0
Multi-Context Temporal Consistent Modeling for Referring Video Object SegmentationCode0
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited SamplesCode0
DTOS: Dynamic Time Object Sensing with Large Multimodal ModelCode0
Expression Prompt Collaboration Transformer for Universal Referring Video Object SegmentationCode0
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence MatchingCode0
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025Code0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.