Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

2021-06-22NeurIPS 2021Code Available1· sign in to hype

Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

Code Available — Be the first to reproduce this paper.

Code

github.com/SysCV/pcan
Officialpytorch★ 366

Abstract

Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks. Code and video resources are available at http://vis.xyz/pub/pcan.

Tasks

Multi-Object Tracking and Segmentation Multiple Object Track and Segmentation Multiple Object Tracking Object Object Tracking Segmentation Video Instance Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
BDD100K val	QDTrack-mots	mMOTSA	22.5	—	Unverified
BDD100K val	PCAN	mMOTSA	27.4	—	Unverified
BDD100K val	QDTrack-mots-fix	mMOTSA	23.5	—	Unverified
BDD100K val	SortIoU	mMOTSA	10.3	—	Unverified
BDD100K val	MaskTrackRCNN	mMOTSA	12.3	—	Unverified
BDD100K val	STEm-Seg	mMOTSA	12.2	—	Unverified
YouTube-VIS validation	PCAN(ResNet-50)	mask AP	36.1	—	Unverified

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Code

Abstract

Tasks

Benchmark Results

Reproductions