VQA: Visual Question Answering

2015-05-03ICCV 2015Code Available1· sign in to hype

Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh

Code Available — Be the first to reproduce this paper.

Code

github.com/vipulgupta1011/swapmix
pytorch★ 20
github.com/mkhalil1998/EC601_Group_Project
pytorch★ 2
github.com/mokhalid-dev/Attention-based-VQA-model
pytorch★ 0
github.com/ramprs/grad-cam
torch★ 0
github.com/yanxinyan1/yxy
pytorch★ 0
github.com/moh833/VQA
none★ 0
github.com/SatyamGaba/vqa
pytorch★ 0
github.com/SatyamGaba/visual_question_answering
pytorch★ 0
github.com/tbmoon/basic_vqa
pytorch★ 0
github.com/SuchismitaSahu1993/VQA-System
none★ 0

Abstract

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).

Tasks

Image Captioning Multiple-choice Visual Question Answering Visual Question Answering (VQA)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice	Dualnet ensemble	Percentage correct	71.18	—	Unverified
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice	LSTM + global features	Percentage correct	69.21	—	Unverified
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice	LSTM blind	Percentage correct	61.41	—	Unverified
COCO Visual Question Answering (VQA) abstract images 1.0 open ended	LSTM blind	Percentage correct	57.19	—	Unverified
COCO Visual Question Answering (VQA) abstract images 1.0 open ended	Dualnet ensemble	Percentage correct	69.73	—	Unverified
COCO Visual Question Answering (VQA) abstract images 1.0 open ended	LSTM + global features	Percentage correct	65.02	—	Unverified
COCO Visual Question Answering (VQA) real images 1.0 multiple choice	LSTM Q+I	Percentage correct	63.1	—	Unverified
COCO Visual Question Answering (VQA) real images 1.0 open ended	LSTM Q+I	Percentage correct	58.2	—	Unverified
COCO Visual Question Answering (VQA) real images 2.0 open ended	HDU-USYD-UNCC	Percentage correct	68.16	—	Unverified
COCO Visual Question Answering (VQA) real images 2.0 open ended	DLAIT	Percentage correct	68.07	—	Unverified

VQA: Visual Question Answering

Code

Abstract

Tasks

Benchmark Results

Reproductions