Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation

2019-11-01WS 2019Unverified0· sign in to hype

Chan Young Park, Yulia Tsvetkov

Unverified — Be the first to reproduce this paper.

Abstract

Neural machine translation (NMT) often fails in one-to-many translation, e.g., in the translation of multi-word expressions, compounds, and collocations. To improve the translation of phrases, phrase-based NMT systems have been proposed; these typically combine word-based NMT with external phrase dictionaries or with phrase tables from phrase-based statistical MT systems. These solutions introduce a significant overhead of additional resources and computational costs. In this paper, we introduce a phrase-based NMT model built upon continuous-output NMT, in which the decoder generates embeddings of words or phrases. The model uses a fertility module, which guides the decoder to generate embeddings of sequences of varying lengths. We show that our model learns to translate phrases better, performing on par with state of the art phrase-based NMT. Since our model does not resort to softmax computation over a huge vocabulary of phrases, its training time is about 112x faster than the baseline.

Tasks

Decoder Machine Translation NMT Translation

Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation

Abstract

Tasks

Reproductions