SOTAVerified

Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching

2025-04-18Code Available0· sign in to hype

Heng Liu, Guanghui Li, Mingqi Gao, XianTong Zhen, Feng Zheng, Yang Wang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Referring video object segmentation (RVOS) aims to segment objects in videos guided by natural language descriptions. We propose FS-RVOS, a Transformer-based model with two key components: a cross-modal affinity module and an instance sequence matching strategy, which extends FS-RVOS to multi-object segmentation (FS-RVMOS). Experiments show FS-RVOS and FS-RVMOS outperform state-of-the-art methods across diverse benchmarks, demonstrating superior robustness and accuracy.

Tasks

Reproductions