Guided Attention for Interpretable Motion Captioning

2023-10-11Code Available0· sign in to hype

Karim Radouane, Julien Lagarde, Sylvie Ranwez, Andon Tchechmedjiev

Code Available — Be the first to reproduce this paper.

Code

github.com/rd20karim/m2t-interpretable
OfficialIn paperpytorch★ 9

Abstract

Diverse and extensive work has recently been conducted on text-conditioned human motion generation. However, progress in the reverse direction, motion captioning, has seen less comparable advancement. In this paper, we introduce a novel architecture design that enhances text generation quality by emphasizing interpretability through spatio-temporal and adaptive attention mechanisms. To encourage human-like reasoning, we propose methods for guiding attention during training, emphasizing relevant skeleton areas over time and distinguishing motion-related words. We discuss and quantify our model's interpretability using relevant histograms and density distributions. Furthermore, we leverage interpretability to derive fine-grained information about human motion, including action localization, body part identification, and the distinction of motion-related words. Finally, we discuss the transferability of our approaches to other tasks. Our experiments demonstrate that attention guidance leads to interpretable captioning while enhancing performance compared to higher parameter-count, non-interpretable state-of-the-art systems. The code is available at: https://github.com/rd20karim/M2T-Interpretable.

Tasks

Action Localization Motion Captioning Motion Generation Spatio-Temporal Video Grounding Text Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
HumanML3D	ST-MLP	BLEU-4	25	—	Unverified
KIT Motion-Language	ST-MLP	BLEU-4	24.4	—	Unverified

Guided Attention for Interpretable Motion Captioning

Code

Abstract

Tasks

Benchmark Results

Reproductions