SOTAVerified

D(H)ante: A New Set of Tools for XIII Century Italian

2016-05-01LREC 2016Unverified0· sign in to hype

Angelo Basile, Federico Sangati

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper we describe 1) the process of converting a corpus of Dante Alighieri from a TEI XML format in to a pseudo-CoNLL format; 2) how a pos-tagger trained on modern Italian performs on Dante's Italian 3) the performances of two different pos-taggers trained on the given corpus. We are making our conversion scripts and models available to the community. The two other models trained on the corpus performs reasonably well. The tool used for the conversion process might turn useful for bridging the gap between traditional digital humanities and modern NLP applications since the TEI original format is not usually suitable for being processed with standard NLP tools. We believe our work will serve both communities: the DH community will be able to tag new documents and the NLP world will have an easier way in converting existing documents to a standardized machine-readable format.

Tasks

Reproductions