TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

2023-06-14ICCV 2023Code Available3· sign in to hype

Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman

Code Available — Be the first to reproduce this paper.

Code

github.com/deepmind/tapnet
OfficialIn paperjax★ 1,820
github.com/riponazad/echotracker
pytorch★ 56
github.com/ibaiGorordo/Tapir-Pytorch-Inference
pytorch★ 18

Abstract

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time, and can be flexibly extended to higher-resolution videos. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found on our project webpage.

Tasks

GPU Motion Estimation Point Tracking Visual Tracking

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DAVIS	TAPIR (Panning MOVi-E)	Average Jaccard	61.3	—	Unverified
DAVIS	TAPIR (MOVi-E)	Average Jaccard	59.8	—	Unverified
Kinetics	TAPIR (MOVi-E)	Average Jaccard	57.1	—	Unverified
Kinetics	TAPIR (Panning MOVi-E)	Average Jaccard	57.2	—	Unverified
Kubric	TAPIR (Panning MOVi-E)	Average Jaccard	84.7	—	Unverified
Kubric	TAPIR (MOVi-E)	Average Jaccard	84.3	—	Unverified
RGB-Stacking	TAPIR (MOVi-E)	Average Jaccard	66.2	—	Unverified
RGB-Stacking	TAPIR (Panning MOVi-E)	Average Jaccard	62.7	—	Unverified

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Code

Abstract

Tasks

Benchmark Results

Reproductions