Unseen Action Recognition with Unpaired Adversarial Multimodal Learning

2019-05-01ICLR 2019Unverified0· sign in to hype

AJ Piergiovanni, Michael S. Ryoo

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we present a method to learn a joint multimodal representation space that allows for the recognition of unseen activities in videos. We compare the effect of placing various constraints on the embedding space using paired text and video data. Additionally, we propose a method to improve the joint embedding space using an adversarial formulation with unpaired text and video data. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that learning such shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning.

Tasks

Action Recognition General Classification Temporal Action Localization

Unseen Action Recognition with Unpaired Adversarial Multimodal Learning

Abstract

Tasks

Reproductions