SOTAVerified

Construction and Annotation of a French Folkstale Corpus

2014-05-01LREC 2014Unverified0· sign in to hype

Garcia-Fern, Anne ez, Anne-Laure Ligozat, Anne Vilnat

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we present the digitization and annotation of a tales corpus - which is to our knowledge the only French tales corpus available and classified according to the Aarne\&Thompson classification - composed of historical texts (with old French parts). We first studied whether the pre-processing tools, namely OCR and PoS-tagging, have good enough accuracies to allow automatic analysis. We also manually annotated this corpus according to several types of information which could prove useful for future work: character references, episodes, and motifs. The contributions are the creation of an corpus of French tales from classical anthropology material, which will be made available to the community; the evaluation of OCR and NLP tools on this corpus; and the annotation with anthropological information.

Tasks

Reproductions