Tracking Anything with Decoupled Video Segmentation

2023-09-07ICCV 2023Code Available3· sign in to hype

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee

Code Available — Be the first to reproduce this paper.

Code

github.com/hkchengrex/Tracking-Anything-with-DEVA
Officialpytorch★ 1,488

Abstract

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation. Code is available at: https://hkchengrex.github.io/Tracking-Anything-with-DEVA

Tasks

Open-Vocabulary Video Segmentation Open-World Video Segmentation Panoptic Segmentation Referring Expression Segmentation Referring Video Object Segmentation Segmentation Semantic Segmentation Semi-Supervised Video Object Segmentation Unsupervised Video Object Segmentation Video Object Segmentation Video Panoptic Segmentation Video Segmentation Video Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DAVIS 2017 (val)	DEVA (ReferFormer)	J&F 1st frame	66.3	—	Unverified
Refer-YouTube-VOS (2021 public validation)	DEVA (ReferFormer)	J&F	66	—	Unverified

Tracking Anything with Decoupled Video Segmentation

Code

Abstract

Tasks

Benchmark Results

Reproductions