SOTAVerified

Document Enhancement System Using Auto-encoders

2019-09-14NeurIPS Workshop Document_Intelligen 2019Unverified0· sign in to hype

Mehrdad J. Gangeh, Sunil R. Tiyyagura, Sridhar V. Dasaratha, Hamid Motahari, Nigel P. Duffy

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The conversion of scanned documents to digital forms is performed using an Optical Character Recognition (OCR) software. This work focuses on improving the quality of scanned documents in order to improve the OCR output. We create an end-to-end document enhancement pipeline which takes in a set of noisy documents and produces clean ones. Deep neural network based denoising auto-encoders are trained to improve the OCR quality. We train a blind model that works on different noise levels of scanned text documents. Results are shown for blurring and watermark noise removal from noisy scanned documents.

Tasks

Reproductions