REE-HDSC: Recognizing Extracted Entities for the Historical Database Suriname Curacao
2023-12-19Code Available0· sign in to hype
Erik Tjong Kim Sang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/ree-hdsc/ree-hdscOfficialIn papernone★ 0
Abstract
We describe the project REE-HDSC and outline our efforts to improve the quality of named entities extracted automatically from texts generated by hand-written text recognition (HTR) software. We describe a six-step processing pipeline and test it by processing 19th and 20th century death certificates from the civil registry of Curacao. We find that the pipeline extracts dates with high precision but that the precision of person name extraction is low. Next we show how name precision extraction can be improved by retraining HTR models with names, post-processing and by identifying and removing incorrect names.