SOTAVerified

OCR Error Correction Using Character Correction and Feature-Based Word Classification

2016-04-21Unverified0· sign in to hype

Ido Kissos, Nachum Dershowitz

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.

Tasks

Reproductions