SOTAVerified

Hierarchical Context-aware Network for Dense Video Event Captioning

2021-08-01ACL 2021Code Available0· sign in to hype

Lei Ji, Xianglin Guo, Haoyang Huang, Xilin Chen

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Dense video event captioning aims to generate a sequence of descriptive captions for each event in a long untrimmed video. Video-level context provides important information and facilities the model to generate consistent and less redundant captions between events. In this paper, we introduce a novel Hierarchical Context-aware Network for dense video event captioning (HCN) to capture context from various aspects. In detail, the model leverages local and global context with different mechanisms to jointly learn to generate coherent captions. The local context module performs full interaction between neighbor frames and the global context module selectively attends to previous or future events. According to our extensive experiment on both Youcook2 and Activitynet Captioning datasets, the video-level HCN model outperforms the event-level context-agnostic model by a large margin. The code is available at https://github.com/KirkGuo/HCN.

Tasks

Reproductions