In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering

2021-08-01ACL 2021Unverified0· sign in to hype

Peter Vickers, Nikolaos Aletras, Emilio Monti, Lo{\"\i}c Barrault

Unverified — Be the first to reproduce this paper.

Abstract

Visual Question Answering (VQA) methods aim at leveraging visual input to answer questions that may require complex reasoning over entities. Current models are trained on labelled data that may be insufficient to learn complex knowledge representations. In this paper, we propose a new method to enhance the reasoning capabilities of a multi-modal pretrained model (Vision+Language BERT) by integrating facts extracted from an external knowledge base. Evaluation on the KVQA dataset benchmark demonstrates that our method outperforms competitive baselines by 19\%, achieving new state-of-the-art results. We also perform an extensive analysis highlighting the limitations of our best performing model through an ablation study.

Tasks

Question Answering Visual Question Answering Visual Question Answering (VQA)

In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering

Abstract

Tasks

Reproductions