SOTAVerified

Enhancing the Nonlinear Mutual Dependencies in Transformers with Mutual Information

2021-11-16ACL ARR September 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The Predictive Uncertainty problem exists in Transformers. We present that pre-trained Transformers can be further regularized by mutual information to alleviate such issue in Neural Machine Translation (NMT). In this paper, we explicitly capture the nonlinear mutual dependencies existing in two types of attentions in decoder to reduce the model uncertainty concerning token-token interactions. Specifically, we adopt an unsupervised objective of mutual information maximization on self-attentions with the contrastive learning methodology and construct the estimation of mutual information by using InfoNCE. Experimental results on WMT'14 EnDe, WMT'14 EnFr demonstrate the consistent effectiveness and evident improvements of our model over the strong baselines. Quantifying the model uncertainty again verifies our hypothesis. The proposed plug-and-play approach can be easily incorporated and deployed into pre-trained Transformer models. Code will be released soon.

Tasks

Reproductions