Posterior Attention Models for Sequence to Sequence Learning

2019-05-01ICLR 2019Unverified0· sign in to hype

Shiv Shankar, Sunita Sarawagi

Unverified — Be the first to reproduce this paper.

Abstract

Modern neural architectures critically rely on attention for mapping structured inputs to sequences. In this paper we show that prevalent attention architectures do not adequately model the dependence among the attention and output variables along the length of a predicted sequence. We present an alternative architecture called Posterior Attention Models that relying on a principled factorization of the full joint distribution of the attention and output variables propose two major changes. First, the position where attention is marginalized is changed from the input to the output. Second, the attention propagated to the next decoding stage is a posterior attention distribution conditioned on the output. Empirically on five translation and two morphological inflection tasks the proposed posterior attention models yield better predictions and alignment accuracy than existing attention models.

Tasks

Morphological Inflection Position Translation

Posterior Attention Models for Sequence to Sequence Learning

Abstract

Tasks

Reproductions