SOTAVerified

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

2021-06-22NeurIPS 2021Code Available1· sign in to hype

Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks. Code and video resources are available at http://vis.xyz/pub/pcan.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
BDD100K valQDTrack-motsmMOTSA22.5Unverified
BDD100K valPCANmMOTSA27.4Unverified
BDD100K valQDTrack-mots-fixmMOTSA23.5Unverified
BDD100K valSortIoUmMOTSA10.3Unverified
BDD100K valMaskTrackRCNNmMOTSA12.3Unverified
BDD100K valSTEm-SegmMOTSA12.2Unverified
YouTube-VIS validationPCAN(ResNet-50)mask AP36.1Unverified

Reproductions