Boosting the creation of a treebank

2014-05-01LREC 2014Unverified0· sign in to hype

Blanca Arias, N{\'u}ria Bel, Merc{\`e} Lorente, Montserrat Marim{\'o}n, Alba Mil{\`a}, Jorge Vivaldi, Muntsa Padr{\'o}, Marina Fomicheva, Imanol Larrea

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper we present the results of an ongoing experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish sentences. In order to save time and cost, our approach was to profit from the typological similarities between Catalan and Spanish to create a first Catalan data set quickly by automatically: (i) annotating with a de-lexicalized Spanish parser, (ii) manually correcting the parses, and (iii) using the Catalan corrected sentences to train a Catalan parser. The results showed that the number of parsed sentences required to train a Catalan parser is about 1000 that were achieved in 4 months, with 2 annotators.

Tasks

Dependency Parsing Machine Translation Question Answering

Boosting the creation of a treebank

Abstract

Tasks

Reproductions