Recurrent Attention for the Transformer
2021-11-01EMNLP (insights) 2021Unverified0· sign in to hype
Jan Rosendahl, Christian Herold, Frithjof Petrick, Hermann Ney
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this work, we conduct a comprehensive investigation on one of the centerpieces of modern machine translation systems: the encoder-decoder attention mechanism. Motivated by the concept of first-order alignments, we extend the (cross-)attention mechanism by a recurrent connection, allowing direct access to previous attention/alignment decisions. We propose several ways to include such a recurrency into the attention mechanism. Verifying their performance across different translation tasks we conclude that these extensions and dependencies are not beneficial for the translation performance of the Transformer architecture.