SOTAVerified

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

2022-07-14Code Available3· sign in to hype

Ho Kei Cheng, Alexander G. Schwing

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores: a rapidly updated sensory memory, a high-resolution working memory, and a compact thus sustained long-term memory. Crucially, we develop a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long-term prediction. Combined with a new memory reading mechanism, XMem greatly exceeds state-of-the-art performance on long-video datasets while being on par with state-of-the-art methods (that do not work on long videos) on short-video datasets. Code is available at https://hkchengrex.github.io/XMem

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
DAVIS 2016XMem (DAVIS only)J&F87.8Unverified
DAVIS 2016XMem (DAVIS+YouTubeVOS only)J&F90.8Unverified
DAVIS 2016XMem (BL30K)J&F92Unverified
DAVIS 2016XMem (MS)J&F92.7Unverified
DAVIS-2017 (test-dev)XMem (DAVIS and YouTubeVOS only)J&F79.8Unverified
DAVIS-2017 (test-dev)XMem (MS)J&F83.1Unverified
DAVIS-2017 (test-dev)XMem (BL30K, MS)J&F83.7Unverified
DAVIS-2017 (test-dev)XMem (BL30K, 600p)J&F82.5Unverified
DAVIS-2017 (test-dev)XMem (BL30K)J&F81.2Unverified
DAVIS-2017 (test-dev)XMemJ&F81Unverified
DAVIS 2017 (val)XMemJ&F86.2Unverified
DAVIS 2017 (val)XMem (BL30K, MS)J&F89.5Unverified
DAVIS 2017 (val)XMem (MS)J&F88.2Unverified
DAVIS 2017 (val)XMem (BL30K)J&F87.7Unverified
DAVIS 2017 (val)XMem (DAVIS and YouTubeVOS only)J&F84.5Unverified
DAVIS 2017 (val)XMem (DAVIS only)J&F76.7Unverified
DAVIS (no YouTube-VOS training)XMemFPS29.6Unverified
MOSEXMemJ&F57.6Unverified
YouTube-VOS 2018XMemOverall85.7Unverified
YouTube-VOS 2018XMem (YouTubeVOS only)Overall84.4Unverified
YouTube-VOS 2018XMem (MS)Overall86.7Unverified
YouTube-VOS 2018XMem (BL30K)Overall86.1Unverified
YouTube-VOS 2018XMem (BL30K, MS)Overall86.9Unverified
YouTube-VOS 2019XMem (BL30K)Overall85.8Unverified
YouTube-VOS 2019XMemOverall84.3Unverified
YouTube-VOS 2019XMem (BL30K, MS)Overall86.8Unverified
YouTube-VOS 2019XMem (MS)Overall86.4Unverified

Reproductions