Putting the Object Back into Video Object Segmentation
Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/hkchengrex/CutieOfficialpytorch★ 1,022
Abstract
We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise, especially in the presence of distractors, resulting in lower performance in more challenging data. In contrast, Cutie performs top-down object-level memory reading by adapting a small set of object queries. Via those, it interacts with the bottom-up pixel features iteratively with a query-based object transformer (qt, hence Cutie). The object queries act as a high-level summary of the target object, while high-resolution feature maps are retained for accurate segmentation. Together with foreground-background masked attention, Cutie cleanly separates the semantics of the foreground object from the background. On the challenging MOSE dataset, Cutie improves by 8.7 J&F over XMem with a similar running time and improves by 4.2 J&F over DeAOT while being three times faster. Code is available at: https://hkchengrex.github.io/Cutie
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| BURST-test | Cutie (base, MEGA, 600 pixels) | HOTA (all) | 66 | — | Unverified |
| BURST-test | Cutie (base, with mose, 600 pixels) | HOTA (all) | 62.6 | — | Unverified |
| BURST-val | Cutie (base, with mose, 600 pixels) | HOTA (all) | 58.4 | — | Unverified |
| BURST-val | Cutie (base, MEGA, 600 pixels) | HOTA (all) | 61.2 | — | Unverified |
| DAVIS-2017 (test-dev) | Cutie+ (base) | J&F | 85.9 | — | Unverified |
| DAVIS-2017 (test-dev) | Cutie (base, MEGA) | J&F | 86.1 | — | Unverified |
| DAVIS-2017 (test-dev) | Cutie+ (base, MEGA) | J&F | 88.1 | — | Unverified |
| DAVIS 2017 (val) | Cutie+ (base, MEGA) | J&F | 88.1 | — | Unverified |
| DAVIS 2017 (val) | Cutie (base) | J&F | 87.9 | — | Unverified |
| DAVIS 2017 (val) | Cutie+ (base) | J&F | 90.5 | — | Unverified |
| MOSE | Cutie+ (base, MEGA) | J&F | 71.7 | — | Unverified |
| MOSE | Cutie+ (small, MEGA) | J&F | 70.3 | — | Unverified |
| MOSE | Cutie (small, MEGA) | J&F | 68.6 | — | Unverified |
| MOSE | Cutie (base, with mose) | J&F | 68.3 | — | Unverified |
| MOSE | Cutie (small, with mose) | J&F | 67.4 | — | Unverified |
| MOSE | Cutie (base) | J&F | 64 | — | Unverified |
| MOSE | Cutie (small) | J&F | 62.2 | — | Unverified |
| MOSE | Cutie (base, MEGA) | J&F | 69.9 | — | Unverified |
| YouTube-VOS 2018 | Cutie+ (base, MEGA) | Overall | 87.5 | — | Unverified |
| YouTube-VOS 2019 | Cutie+ (base, MEGA) | Overall | 87.5 | — | Unverified |