TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

2021-09-21Code Available1· sign in to hype

Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei

Code Available — Be the first to reproduce this paper.

Code

github.com/microsoft/unilm/tree/master/trocr
Officialpytorch★ 0
github.com/oleehyo/texteller
paddle★ 726
github.com/d-gurgurov/im2latex
pytorch★ 19
github.com/prathameshza/TrOCR_FineTuning
none★ 8
github.com/pwc-1/Paper-10/tree/main/trocr
mindspore★ 0
github.com/pwc-1/Paper-9/tree/main/1/trocr
mindspore★ 0
github.com/MindCode-4/code-5/tree/main/trocr
mindspore★ 0

Abstract

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at https://aka.ms/trocr.

Tasks

Handwritten Text Recognition Language Modeling Language Modelling Optical Character Recognition Optical Character Recognition (OCR)Scene Text Recognition Text Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
IAM	TrOCR-large 558M	CER	2.89	—	Unverified
IAM	TrOCR-base 334M	CER	3.42	—	Unverified
IAM	TrOCR-small 62M	CER	4.22	—	Unverified
IAM(line-level)	TrOCR	Test CER	3.4	—	Unverified
LAM(line-level)	TrOCR	Test CER	3.6	—	Unverified

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Code

Abstract

Tasks

Benchmark Results

Reproductions