SOTAVerified

ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities

2021-11-16ACL ARR November 2021Code Available0· sign in to hype

Anonymous

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Whether to retrieve, answer, translate or reason, multimodality opens up new challenges and perspectives. In this context, we are interested in Knowledge-based Visual Question Answering about named Entities (KVQAE). To benchmark the task, we provide ViQuAE, a dataset of 3.7K questions paired with images. This is the first KVQAE dataset to cover a wide range of entity types (e.g. persons, landmarks, products). The dataset is annotated using a semi-automatic method that could be extended to larger data scales. We also propose a Knowledge Base (KB) based on Wikipedia composed of 1.5M articles paired with images. To set a baseline on the benchmark, we address KVQAE as a two-stage problem: Information Retrieval (IR) and Reading Comprehension (RC). IR is carried out with a combination of face recognition, image retrieval and text retrieval while RC is purely text-based. The experiments empirically demonstrate the difficulty of the task. This work paves the way towards better multimodal entity representations and question answering. The dataset, KB and code will be available at https://github.com/Anonymous/ViQuAE.

Tasks

Reproductions