XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

2022-07-14Code Available3· sign in to hype

Ho Kei Cheng, Alexander G. Schwing

Code Available — Be the first to reproduce this paper.

Code

github.com/hkchengrex/XMem
Officialpytorch★ 1,962
github.com/tianyuan168326/videosemanticcompression-pytorch
pytorch★ 37

Abstract

We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores: a rapidly updated sensory memory, a high-resolution working memory, and a compact thus sustained long-term memory. Crucially, we develop a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long-term prediction. Combined with a new memory reading mechanism, XMem greatly exceeds state-of-the-art performance on long-video datasets while being on par with state-of-the-art methods (that do not work on long videos) on short-video datasets. Code is available at https://hkchengrex.github.io/XMem

Tasks

2D Human Pose Estimation 2D Object Detection 3D Absolute Human Pose Estimation Segmentation Semantic Segmentation Semi-Supervised Video Object Segmentation Video Object Segmentation Video Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DAVIS 2016	XMem (DAVIS only)	J&F	87.8	—	Unverified
DAVIS 2016	XMem (DAVIS+YouTubeVOS only)	J&F	90.8	—	Unverified
DAVIS 2016	XMem (BL30K)	J&F	92	—	Unverified
DAVIS 2016	XMem (MS)	J&F	92.7	—	Unverified
DAVIS-2017 (test-dev)	XMem (DAVIS and YouTubeVOS only)	J&F	79.8	—	Unverified
DAVIS-2017 (test-dev)	XMem (MS)	J&F	83.1	—	Unverified
DAVIS-2017 (test-dev)	XMem (BL30K, MS)	J&F	83.7	—	Unverified
DAVIS-2017 (test-dev)	XMem (BL30K, 600p)	J&F	82.5	—	Unverified
DAVIS-2017 (test-dev)	XMem (BL30K)	J&F	81.2	—	Unverified
DAVIS-2017 (test-dev)	XMem	J&F	81	—	Unverified
DAVIS 2017 (val)	XMem	J&F	86.2	—	Unverified
DAVIS 2017 (val)	XMem (BL30K, MS)	J&F	89.5	—	Unverified
DAVIS 2017 (val)	XMem (MS)	J&F	88.2	—	Unverified
DAVIS 2017 (val)	XMem (BL30K)	J&F	87.7	—	Unverified
DAVIS 2017 (val)	XMem (DAVIS and YouTubeVOS only)	J&F	84.5	—	Unverified
DAVIS 2017 (val)	XMem (DAVIS only)	J&F	76.7	—	Unverified
DAVIS (no YouTube-VOS training)	XMem	FPS	29.6	—	Unverified
MOSE	XMem	J&F	57.6	—	Unverified
YouTube-VOS 2018	XMem	Overall	85.7	—	Unverified
YouTube-VOS 2018	XMem (YouTubeVOS only)	Overall	84.4	—	Unverified
YouTube-VOS 2018	XMem (MS)	Overall	86.7	—	Unverified
YouTube-VOS 2018	XMem (BL30K)	Overall	86.1	—	Unverified
YouTube-VOS 2018	XMem (BL30K, MS)	Overall	86.9	—	Unverified
YouTube-VOS 2019	XMem (BL30K)	Overall	85.8	—	Unverified
YouTube-VOS 2019	XMem	Overall	84.3	—	Unverified
YouTube-VOS 2019	XMem (BL30K, MS)	Overall	86.8	—	Unverified
YouTube-VOS 2019	XMem (MS)	Overall	86.4	—	Unverified

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Code

Abstract

Tasks

Benchmark Results

Reproductions