ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

2023-12-28CVPR 2024Code Available2· sign in to hype

Yifan Bai, Zeyang Zhao, Yihong Gong, Xing Wei

Code Available — Be the first to reproduce this paper.

Code

github.com/miv-xjtu/artrack
Officialpytorch★ 305

Abstract

We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features, guided by previous estimates. Furthermore, ARTrackV2 stands out for its efficiency and simplicity, obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity, ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating remarkable efficiency improvement. In particular, ARTrackV2 achieves AO score of 79.5\% on GOT-10k, and AUC of 86.1\% on TrackingNet while being 3.6 faster than ARTrack. The code will be released.

Tasks

Object Object Tracking Template Matching Video Object Tracking Visual Object Tracking

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
GOT-10k	ARTrackV2-L	Average Overlap	79.5	—	Unverified
LaSOT	ARTrackV2-L	AUC	73.6	—	Unverified
LaSOT-ext	ARTrackV2-L	AUC	53.4	—	Unverified
NeedForSpeed	ARTrackV2-L	AUC	0.68	—	Unverified
TNL2K	ARTrackV2-L	AUC	61.6	—	Unverified
TrackingNet	ARTrackV2-L	Accuracy	86.1	—	Unverified
UAV123	ARTrackV2-L	AUC	0.72	—	Unverified

ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

Code

Abstract

Tasks

Benchmark Results

Reproductions