SOTAVerified

Prompsit’s Submission to the IWSLT 2018 Low Resource Machine Translation Task

2018-10-01IWSLT (EMNLP) 2018Unverified0· sign in to hype

Víctor M. Sánchez-Cartagena

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper presents Prompsit Language Engineering’s submission to the IWSLT 2018 Low Resource Machine Translation task. Our submission is based on cross-lingual learning: a multilingual neural machine translation system was created with the sole purpose of improving translation quality on the Basque-to-English language pair. The multilingual system was trained on a combination of in-domain data, pseudo in-domain data obtained via cross-entropy data selection and backtranslated data. We morphologically segmented Basque text with a novel approach that only requires a dictionary such as those used by spell checkers and proved that this segmentation approach outperforms the widespread byte pair encoding strategy for this task.

Tasks

Reproductions