Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

2020-07-01ACL 2020Unverified0· sign in to hype

Arya D. McCarthy, Xi-An Li, Jiatao Gu, Ning Dong

Unverified — Be the first to reproduce this paper.

Abstract

This paper proposes a simple and effective approach to address the problem of posterior collapse in conditional variational autoencoders (CVAEs). It thus improves performance of machine translation models that use noisy or monolingual data, as well as in conventional settings. Extending Transformer and conditional VAEs, our proposed latent variable model measurably prevents posterior collapse by (1) using a modified evidence lower bound (ELBO) objective which promotes mutual information between the latent variable and the target, and (2) guiding the latent variable with an auxiliary bag-of-words prediction task. As a result, the proposed model yields improved translation quality compared to existing variational NMT models on WMT Ro↔En and De↔En. With latent variables being effectively utilized, our model demonstrates improved robustness over non-latent Transformer in handling uncertainty: exploiting noisy source-side monolingual data (up to +3.2 BLEU), and training with weakly aligned web-mined parallel data (up to +4.7 BLEU).

Tasks

de-en Machine Translation NMT Translation

Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

Abstract

Tasks

Reproductions