SOTAVerified

Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign

2012-05-01LREC 2012Unverified0· sign in to hype

Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum, Ludovic Quintard

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities. In this new definition, the extended named entities we proposed are both hierarchical and compositional. In this paper, we focused on the annotation of a corpus composed of press archives, OCRed from French newspapers of December 1890. We present the methodology we used to produce the corpus and the characteristics of the corpus in terms of named entities annotation. This annotated corpus has been used in an evaluation campaign. We present this evaluation, the metrics we used and the results obtained by the participants.

Tasks

Reproductions