ER3: A Unified Framework for Event Retrieval, Recognition and Recounting

2017-07-01CVPR 2017Unverified0· sign in to hype

Zhanning Gao, Gang Hua, Dong-Qing Zhang, Nebojsa Jojic, Le Wang, Jianru Xue, Nanning Zheng

Unverified — Be the first to reproduce this paper.

Abstract

We develop a unified framework for complex event retrieval, recognition and recounting. The framework is based on a compact video representation that exploits the temporal correlations in image features. Our feature alignment procedure identifies and removes the feature redundancies across frames and outputs an intermediate tensor representation we call video imprint. The video imprint is then fed into a reasoning network, whose attention mechanism parallels that of memory networks used in language modeling. The reasoning network simultaneously recognizes the event category and locates the key pieces of evidence for event recounting. In event retrieval tasks, we show that the compact video representation aggregated from the video imprint achieves significantly better retrieval accuracy compared with existing methods. We also set new state of the art results in event recognition tasks with an additional benefit: The latent structure in our reasoning network highlights the areas of the video imprint and can be directly used for event recounting. As video imprint maps back to locations in the video frames, the network allows not only the identification of key frames but also specific areas inside each frame which are most influential to the decision process.

Tasks

Language Modeling Language Modelling Retrieval

ER3: A Unified Framework for Event Retrieval, Recognition and Recounting

Abstract

Tasks

Reproductions