Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

2021-03-14CVPR 2021Code Available1· sign in to hype

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

Code Available — Be the first to reproduce this paper.

Code

github.com/hkchengrex/MiVOS
Officialpytorch★ 485
github.com/hkchengrex/Mask-Propagation
pytorch★ 131
github.com/hkchengrex/Scribble-to-Mask
pytorch★ 90
github.com/limingxing00/rde-vos-cvpr2022
pytorch★ 36
github.com/Vujas-Eteph/CiVOS
pytorch★ 6

Abstract

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-k filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory. We evaluate our method both qualitatively and quantitatively with different forms of user interactions (e.g., scribbles, clicks) on DAVIS to show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions, with the additional advantage in generalizing to different types of user interactions. We contribute a large-scale synthetic VOS dataset with pixel-accurate segmentation of 4.8M frames to accompany our source codes to facilitate future research.

Tasks

Interactive Video Object Segmentation Semantic Segmentation Semi-Supervised Video Object Segmentation Video Object Segmentation Video Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DAVIS 2016	MiVOS	J&F	91	—	Unverified
DAVIS-2017 (test-dev)	MiVOS	J&F	76.5	—	Unverified
DAVIS 2017 (val)	MiVOS	J&F	84.5	—	Unverified
YouTube-VOS 2018	MiVOS	Overall	82	—	Unverified

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Code

Abstract

Tasks

Benchmark Results

Reproductions