HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

2019-11-01IJCNLP 2019Unverified0· sign in to hype

Brian Thompson, Rebecca Knowles, Xuan Zhang, Huda Khayrallah, Kevin Duh, Philipp Koehn

Unverified — Be the first to reproduce this paper.

Abstract

Bilingual lexicons are valuable resources used by professional human translators. While these resources can be easily incorporated in statistical machine translation, it is unclear how to best do so in the neural framework. In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation. Our data consists of human generated alignments of words and phrases in machine translation test sets in three language pairs (Russian-English, Chinese-English, and Korean-English), resulting in clean bilingual lexicons which are well matched to the reference. We also present two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.

Tasks

Machine Translation Translation

HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

Abstract

Tasks

Reproductions