SOTAVerified

DocVQA: A Dataset for VQA on Document Images

2020-07-01Code Available1· sign in to hype

Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy). The models need to improve specifically on questions where understanding structure of the document is crucial. The dataset, code and leaderboard are available at docvqa.org

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
DocVQA testHumanANLS0.94Unverified
DocVQA testBERT_LARGE_SQUAD_DOCVQA_FINETUNED_BaselineANLS0.67Unverified
DocVQA valBERT LARGE BaselineAccuracy54.48Unverified
DocVQA valđm bkbk lôn0.66Unverified

Reproductions