CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

2022-04-05Code Available1· sign in to hype

Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata

Code Available — Be the first to reproduce this paper.

Code

github.com/explainableml/clevr-x
OfficialIn paperpytorch★ 29

Abstract

Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at https://explainableml.github.io/CLEVR-X/.

Tasks

Explanation Generation Question Answering Text Generation Visual Question Answering Visual Question Answering (VQA)Visual Reasoning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CLEVR-X	PJ-X	B4	87.4	—	Unverified
CLEVR-X	FM	B4	78.8	—	Unverified

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

Code

Abstract

Tasks

Benchmark Results

Reproductions