High-Order Attention Models for Visual Question Answering

2017-11-12NeurIPS 2017Code Available0· sign in to hype

Idan Schwartz, Alexander G. Schwing, Tamir Hazan

Code Available — Be the first to reproduce this paper.

Code

github.com/idansc/HighOrderAtten
OfficialIn papertorch★ 0

Abstract

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

Tasks

Question Answering Visual Question Answering Visual Question Answering (VQA)Vocal Bursts Intensity Prediction

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
COCO Visual Question Answering (VQA) real images 1.0 multiple choice	3-Modalities: Unary + Pairwise + Ternary (ResNet)	Percentage correct	69.3	—	Unverified

High-Order Attention Models for Visual Question Answering

Code

Abstract

Tasks

Benchmark Results

Reproductions