OCR Quality and NLP Preprocessing
2019-08-01WS 2019Unverified0· sign in to hype
Margot Mieskes, Stefan Schmunk
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We present initial experiments to evaluate the performance of tasks such as Part of Speech Tagging on data corrupted by Optical Character Recognition (OCR). Our results, based on English and German data, using artificial experiments as well as initial real OCRed data indicate that already a small drop in OCR quality considerably increases the error rates, which would have a significant impact on subsequent processing steps.