Simple Baseline for Visual Question Answering

2015-12-07Code Available0· sign in to hype

Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

Code Available — Be the first to reproduce this paper.

Code

github.com/metalbubble/VQAbaseline
OfficialIn papernone★ 0
github.com/karunraju/VQA
pytorch★ 0
github.com/sidaw/nbsvm
none★ 0
github.com/yikang-li/iqan
pytorch★ 0
github.com/sidgan/whats_in_a_question
none★ 0
github.com/SkyOL5/VQA-CoAttention
pytorch★ 0
github.com/miohana/vqa
tf★ 0

Abstract

We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. .

Tasks

Visual Question Answering Visual Question Answering (VQA)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
COCO Visual Question Answering (VQA) real images 1.0 multiple choice	iBOWIMG baseline	Percentage correct	62	—	Unverified
COCO Visual Question Answering (VQA) real images 1.0 open ended	iBOWIMG baseline	Percentage correct	55.9	—	Unverified

Simple Baseline for Visual Question Answering

Code

Abstract

Tasks

Benchmark Results

Reproductions