SOTAVerified

Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations

2020-12-01COLING 2020Unverified0· sign in to hype

Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, Lori Levin

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers. Manual production of IGT takes time and requires linguistic expertise. We attempt to address this issue by creating automatic glossing models, using modern multi-source neural models that additionally leverage easy-to-collect translations. We further explore cross-lingual transfer and a simple output length control mechanism, further refining our models. Evaluated on three challenging low-resource scenarios, our approach significantly outperforms a recent, state-of-the-art baseline, particularly improving on overall accuracy as well as lemma and tag recall.

Tasks

Reproductions