Kernelized Memory Network for Video Object Segmentation

2020-07-16ECCV 2020Code Available1· sign in to hype

Hongje Seong, Junhyuk Hyun, Euntai Kim

Code Available — Be the first to reproduce this paper.

Code

github.com/hkchengrex/Mask-Propagation
pytorch★ 131

Abstract

Semi-supervised video object segmentation (VOS) is a task that involves predicting a target object in a video when the ground truth segmentation mask of the target object is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising solution for semi-supervised VOS. However, an important point is overlooked when applying STM to VOS. The solution (STM) is non-local, but the problem (VOS) is predominantly local. To solve the mismatch between STM and VOS, we propose a kernelized memory network (KMN). Before being trained on real videos, our KMN is pre-trained on static images, as in previous works. Unlike in previous works, we use the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction. The proposed KMN surpasses the state-of-the-art on standard benchmarks by a significant margin (+5% on DAVIS 2017 test-dev set). In addition, the runtime of KMN is 0.12 seconds per frame on the DAVIS 2016 validation set, and the KMN rarely requires extra computation, when compared with STM.

Tasks

Object Semantic Segmentation Semi-Supervised Video Object Segmentation Video Object Segmentation Video Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DAVIS 2016	KMN	J&F	90.5	—	Unverified
DAVIS-2017 (test-dev)	KMN	J&F	77.2	—	Unverified
DAVIS 2017 (val)	KMN	J&F	82.8	—	Unverified
DAVIS (no YouTube-VOS training)	KMN	D17 val (G)	76	—	Unverified
YouTube-VOS 2018	KMN	Overall	81.4	—	Unverified

Kernelized Memory Network for Video Object Segmentation

Code

Abstract

Tasks

Benchmark Results

Reproductions