Answering Questions about Data Visualizations using Efficient Bimodal Fusion

2019-08-05Code Available0· sign in to hype

Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan

Code Available — Be the first to reproduce this paper.

Code

github.com/kushalkafle/PREFIL
Officialpytorch★ 0

Abstract

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Without modifications, state-of-the-art VQA algorithms perform poorly on this task. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart.

Tasks

Chart Question Answering Optical Character Recognition Optical Character Recognition (OCR)Question Answering Visual Question Answering Visual Question Answering (VQA)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DVQA test-familiar	PReFIL (Oracle OCR)	1:1 Accuracy	96.37	—	Unverified
FigureQA - test 1	PReFIL	1:1 Accuracy	94.88	—	Unverified
PlotQA-D1	PReFIL	1:1 Accuracy	57.91	—	Unverified
PlotQA-D2	PReFIL	1:1 Accuracy	10.37	—	Unverified

Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Code

Abstract

Tasks

Benchmark Results

Reproductions