SOTAVerified

Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite

2017-08-01WS 2017Unverified0· sign in to hype

Maria Sukhareva, Francesco Fuscagni, Johannes Daxenberger, Susanne G{\"o}rke, Doris Prechel, Iryna Gurevych

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper presents a statistical approach to automatic morphosyntactic annotation of Hittite transcripts. Hittite is an extinct Indo-European language using the cuneiform script. There are currently no morphosyntactic annotations available for Hittite, so we explored methods of distant supervision. The annotations were projected from parallel German translations of the Hittite texts. In order to reduce data sparsity, we applied stemming of German and Hittite texts. As there is no off-the-shelf Hittite stemmer, a stemmer for Hittite was developed for this purpose. The resulting annotation projections were used to train a POS tagger, achieving an accuracy of 69\% on a test sample. To our knowledge, this is the first attempt of statistical POS tagging of a cuneiform language.

Tasks

Reproductions