Joint Source-Target Self Attention with Locality Constraints

2019-05-16Code Available0· sign in to hype

José A. R. Fonollosa, Noe Casas, Marta R. Costa-jussà

Code Available — Be the first to reproduce this paper.

Code

github.com/jarfo/joint
OfficialIn paperpytorch★ 0
github.com/lkfo415579/joint
pytorch★ 0

Abstract

The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks with both conventions. Our simplified architecture consists in the decoder part of a transformer model, based on self-attention, but with locality constraints applied on the attention receptive field. As input for training, both source and target sentences are fed to the network, which is trained as a language model. At inference time, the target tokens are predicted autoregressively starting with the source sequence as previous tokens. The proposed model achieves a new state of the art of 35.7 BLEU on IWSLT'14 German-English and matches the best reported results in the literature on the WMT'14 English-German and WMT'14 English-French translation benchmarks.

Tasks

Decoder Language Modeling Language Modelling Machine Translation Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
IWSLT2014 German-English	Local Joint Self-attention	BLEU score	35.7	—	Unverified
WMT2014 English-French	Local Joint Self-attention	BLEU score	43.3	—	Unverified
WMT2014 English-German	Local Joint Self-attention	BLEU score	29.7	—	Unverified

Joint Source-Target Self Attention with Locality Constraints

Code

Abstract

Tasks

Benchmark Results

Reproductions