Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts
2021-04-30NAACL (BioNLP) 2021Code Available0· sign in to hype
Mark-Christoph Müller, Sucheta Ghosh, Ulrike Wittig, Maja Rey
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/nlpAThits/BioNLP2021OfficialIn papernone★ 2
Abstract
We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.