OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited
2016-05-01LREC 2016Unverified0· sign in to hype
Martin Reynaert
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We present further work on evaluation of the fully automatic post-correction of Early Dutch Books Online, a collection of 10,333 18th century books. In prior work we evaluated the new implementation of Text-Induced Corpus Clean-up (TICCL) on the basis of a single book Gold Standard derived from this collection. In the current paper we revisit the same collection on the basis of a sizeable 1020 item random sample of OCR post-corrected strings from the full collection. Both evaluations have their own stories to tell and lessons to teach.