SOTAVerified

Towards Knowledge-Augmented Visual Question Answering

2020-12-01COLING 2020Code Available0· sign in to hype

Maryam Ziaeefard, Freddy Lecue

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Visual Question Answering (VQA) remains algorithmically challenging while it is effortless for humans. Humans combine visual observations with general and commonsense knowledge to answer questions about a given image. In this paper, we address the problem of incorporating general knowledge into VQA models while leveraging the visual information. We propose a model that captures the interactions between objects in a visual scene and entities in an external knowledge source. Our model is a graph-based approach that combines scene graphs with concept graphs, which learns a question-adaptive graph representation of related knowledge instances. We use Graph Attention Networks to set higher importance to key knowledge instances that are mostly relevant to each question. We exploit ConceptNet as the source of general knowledge and evaluate the performance of our model on the challenging OK-VQA dataset.

Tasks

Reproductions