End-to-end Document Recognition and Understanding with Dessurt

2022-03-30Code Available1· sign in to hype

Brian Davis, Bryan Morse, Bryan Price, Chris Tensmeyer, Curtis Wigington, Vlad Morariu

Code Available — Be the first to reproduce this paper.

Code

github.com/herobd/dessurt
OfficialIn paperpytorch★ 62
github.com/herobd/NAF_dataset
none★ 38

Abstract

We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to the document understanding, it does not require an external recognition model as prior methods do. Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks. We show that this model is effective at 9 different dataset-task combinations.

Tasks

document understanding Visual Question Answering (VQA)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DocVQA test	Dessurt	ANLS	0.63	—	Unverified

End-to-end Document Recognition and Understanding with Dessurt

Code

Abstract

Tasks

Benchmark Results

Reproductions