SOTAVerified

Referring Video Object Segmentation

Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video.

Papers

Showing 5174 of 74 papers

TitleStatusHype
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations0
Bidirectional Correlation-Driven Inter-Frame Interaction Transformer for Referring Video Object Segmentation0
Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation0
Robust Referring Video Object Segmentation with Cyclic Structural Consensus0
Decoupled Motion Expression Video Segmentation0
Segment Every Reference Object in Spatial and Temporal Spaces0
Semantic and Sequential Alignment for Referring Video Object Segmentation0
Temporal Collection and Distribution for Referring Video Object Segmentation0
The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation0
The Instance-centric Transformer for the RVOS Track of LSVOS Challenge: 3rd Place Solution0
The Second Place Solution for The 4th Large-scale Video Object Segmentation Challenge--Track 3: Referring Video Object Segmentation0
3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation0
Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation0
Harnessing Vision-Language Pretrained Models with Temporal-Aware Adaptation for Referring Video Object Segmentation0
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation0
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited SamplesCode0
Expression Prompt Collaboration Transformer for Universal Referring Video Object SegmentationCode0
Cross-Modal Self-Attention Network for Referring Image SegmentationCode0
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context ModelingCode0
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025Code0
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence MatchingCode0
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context UnderstandingCode0
Multi-Context Temporal Consistent Modeling for Referring Video Object SegmentationCode0
DTOS: Dynamic Time Object Sensing with Large Multimodal ModelCode0
Show:102550
← PrevPage 3 of 3Next →

No leaderboard results yet.