Putting the Object Back into Video Object Segmentation

2023-10-19CVPR 2024Code Available3· sign in to hype

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing

Code Available — Be the first to reproduce this paper.

Code

github.com/hkchengrex/Cutie
Officialpytorch★ 1,022

Abstract

We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise, especially in the presence of distractors, resulting in lower performance in more challenging data. In contrast, Cutie performs top-down object-level memory reading by adapting a small set of object queries. Via those, it interacts with the bottom-up pixel features iteratively with a query-based object transformer (qt, hence Cutie). The object queries act as a high-level summary of the target object, while high-resolution feature maps are retained for accurate segmentation. Together with foreground-background masked attention, Cutie cleanly separates the semantics of the foreground object from the background. On the challenging MOSE dataset, Cutie improves by 8.7 J&F over XMem with a similar running time and improves by 4.2 J&F over DeAOT while being three times faster. Code is available at: https://hkchengrex.github.io/Cutie

Tasks

Object Segmentation Semantic Segmentation Semi-Supervised Video Object Segmentation Video Object Segmentation Video Semantic Segmentation Visual Object Tracking

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
BURST-test	Cutie (base, MEGA, 600 pixels)	HOTA (all)	66	—	Unverified
BURST-test	Cutie (base, with mose, 600 pixels)	HOTA (all)	62.6	—	Unverified
BURST-val	Cutie (base, with mose, 600 pixels)	HOTA (all)	58.4	—	Unverified
BURST-val	Cutie (base, MEGA, 600 pixels)	HOTA (all)	61.2	—	Unverified
DAVIS-2017 (test-dev)	Cutie+ (base)	J&F	85.9	—	Unverified
DAVIS-2017 (test-dev)	Cutie (base, MEGA)	J&F	86.1	—	Unverified
DAVIS-2017 (test-dev)	Cutie+ (base, MEGA)	J&F	88.1	—	Unverified
DAVIS 2017 (val)	Cutie+ (base, MEGA)	J&F	88.1	—	Unverified
DAVIS 2017 (val)	Cutie (base)	J&F	87.9	—	Unverified
DAVIS 2017 (val)	Cutie+ (base)	J&F	90.5	—	Unverified
MOSE	Cutie+ (base, MEGA)	J&F	71.7	—	Unverified
MOSE	Cutie+ (small, MEGA)	J&F	70.3	—	Unverified
MOSE	Cutie (small, MEGA)	J&F	68.6	—	Unverified
MOSE	Cutie (base, with mose)	J&F	68.3	—	Unverified
MOSE	Cutie (small, with mose)	J&F	67.4	—	Unverified
MOSE	Cutie (base)	J&F	64	—	Unverified
MOSE	Cutie (small)	J&F	62.2	—	Unverified
MOSE	Cutie (base, MEGA)	J&F	69.9	—	Unverified
YouTube-VOS 2018	Cutie+ (base, MEGA)	Overall	87.5	—	Unverified
YouTube-VOS 2019	Cutie+ (base, MEGA)	Overall	87.5	—	Unverified

Putting the Object Back into Video Object Segmentation

Code

Abstract

Tasks

Benchmark Results

Reproductions