TrickVOS: A Bag of Tricks for Video Object Segmentation

2023-06-27Unverified0· sign in to hype

Evangelos Skartados, Konstantinos Georgiadis, Mehmet Kerim Yucel, Koskinas Ioannis, Armando Domi, Anastasios Drosou, Bruno Manganelli, Albert Saa-Garriga

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Space-time memory (STM) network methods have been dominant in semi-supervised video object segmentation (SVOS) due to their remarkable performance. In this work, we identify three key aspects where we can improve such methods; i) supervisory signal, ii) pretraining and iii) spatial awareness. We then propose TrickVOS; a generic, method-agnostic bag of tricks addressing each aspect with i) a structure-aware hybrid loss, ii) a simple decoder pretraining regime and iii) a cheap tracker that imposes spatial constraints in model predictions. Finally, we propose a lightweight network and show that when trained with TrickVOS, it achieves competitive results to state-of-the-art methods on DAVIS and YouTube benchmarks, while being one of the first STM-based SVOS methods that can run in real-time on a mobile device.

Tasks

Decoder Object Semantic Segmentation Semi-Supervised Video Object Segmentation Video Object Segmentation Video Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DAVIS 2016	Lightweight TrickVOS (PT)	J&F	89.3	—	Unverified
DAVIS 2016	STCN + TrickVOS (PT)	J&F	91.8	—	Unverified
DAVIS 2016	STCN + TrickVOS (PT)	Speed (FPS)	45.4	—	Unverified
DAVIS 2017	STCN + TrickVOS (PT)	F-measure (Mean)	89.6	—	Unverified
DAVIS 2017	Lightweight TrickVOS (PT)	F-measure (Mean)	86	—	Unverified
YouTube-VOS 2019	STCN + TrickVOS (PT)	Jaccard (Seen)	82.1	—	Unverified
YouTube-VOS 2019	Lightweight TrickVOS (PT)	Jaccard (Seen)	79.5	—	Unverified

TrickVOS: A Bag of Tricks for Video Object Segmentation

Abstract

Tasks

Benchmark Results

Reproductions