Actor-agnostic Multi-label Action Recognition with Multi-modal Query

2023-07-20Code Available1· sign in to hype

Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta

Code Available — Be the first to reproduce this paper.

Code

github.com/mondalanindya/msqnet
OfficialIn paperpytorch★ 24

Abstract

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code is made available at https://github.com/mondalanindya/MSQNet.

Tasks

Action Classification Action Recognition Action Recognition In Videos Action Recognition on HMDB-51 Animal Action Recognition Zero-Shot Action Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Animal Kingdom	MSQNet	mAP	73.1	—	Unverified
Charades	MSQNet	MAP	47.57	—	Unverified
HMDB51	MSQNet	Accuracy	69.43	—	Unverified
HMDB51	MSQNet	Accuracy	93.25	—	Unverified
Hockey	MSQNet	Accuracy	3.05	—	Unverified
THUMOS14	MSQNet	Accuracy	83.16	—	Unverified

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Code

Abstract

Tasks

Benchmark Results

Reproductions