ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering

2019-06-06Code Available0· sign in to hype

Zhou Yu, Dejing Xu, Jun Yu, Ting Yu, Zhou Zhao, Yueting Zhuang, DaCheng Tao

Code Available — Be the first to reproduce this paper.

Code

github.com/MILVLG/activitynet-qa
OfficialIn papernone★ 0

Abstract

Recent developments in modeling language and vision have been successfully applied to image question answering. It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA). Compared to the image domain where large scale and fully annotated benchmark datasets exists, VideoQA datasets are limited to small scale and are automatically generated, etc. These limitations restrict their applicability in practice. Here we introduce ActivityNet-QA, a fully annotated and large scale VideoQA dataset. The dataset consists of 58,000 QA pairs on 5,800 complex web videos derived from the popular ActivityNet dataset. We present a statistical analysis of our ActivityNet-QA dataset and conduct extensive experiments on it by comparing existing VideoQA baselines. Moreover, we explore various video representation strategies to improve VideoQA performance, especially for long videos. The dataset is available at https://github.com/MILVLG/activitynet-qa

Tasks

Question Answering Video Question Answering Visual Question Answering (VQA)Zero-Shot Video Question Answer

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ActivityNet-QA	E-SA	Accuracy	31.8	—	Unverified
ActivityNet-QA	E-MN	Accuracy	27.1	—	Unverified
ActivityNet-QA	E-VQA	Accuracy	25.1	—	Unverified

ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering

Code

Abstract

Tasks

Benchmark Results

Reproductions