Linguistically Informed Hindi-English Neural Machine Translation

2020-05-01LREC 2020Unverified0· sign in to hype

Vikrant Goyal, Pruthwik Mishra, Dipti Misra Sharma

Unverified — Be the first to reproduce this paper.

Abstract

Hindi-English Machine Translation is a challenging problem, owing to multiple factors including the morphological complexity and relatively free word order of Hindi, in addition to the lack of sufficient parallel training data. Neural Machine Translation (NMT) is a rapidly advancing MT paradigm and has shown promising results for many language pairs, especially in large training data scenarios. To overcome the data sparsity issue caused by the lack of large parallel corpora for Hindi-English, we propose a method to employ additional linguistic knowledge which is encoded by different phenomena depicted by Hindi. We generalize the embedding layer of the state-of-the-art Transformer model to incorporate linguistic features like POS tag, lemma and morph features to improve the translation performance. We compare the results obtained on incorporating this knowledge with the baseline systems and demonstrate significant performance improvements. Although, the Transformer NMT models have a strong efficacy to learn language constructs, we show that the usage of specific features further help in improving the translation performance.

Tasks

LEMMA Machine Translation MORPH NMT POS TAG Translation

Linguistically Informed Hindi-English Neural Machine Translation

Abstract

Tasks

Reproductions