Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

2021-12-09CVPR 2022Code Available0· sign in to hype

Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

Code Available — Be the first to reproduce this paper.

Code

github.com/tensorflow/models
OfficialIn papertf★ 77,694
github.com/MindSpore-scientific-2/code-8/tree/main/temporal-normalizing-flows
mindspore★ 0

Abstract

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes sub-optimal for learning spatio-temporally fine-grained features in videos, where scenes and instances evolve through space and time. In this paper, we present Contextualized Spatio-Temporal Contrastive Learning (ConST-CL) to effectively learn spatio-temporally fine-grained video representations via self-supervision. We first design a region-based pretext task which requires the model to transform in-stance representations from one view to another, guided by context features. Further, we introduce a simple network design that successfully reconciles the simultaneous learning process of both holistic and local representations. We evaluate our learned representations on a variety of downstream tasks and show that ConST-CL achieves competitive results on 6 datasets, including Kinetics, UCF, HMDB, AVA-Kinetics, AVA and OTB.

Tasks

Action Localization Action Recognition Contrastive Learning Object Tracking Self-Supervised Learning Spatio-Temporal Action Localization Temporal Action Localization

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

Code

Abstract

Tasks

Reproductions