Cross Attention Augmented Transducer Networks for Simultaneous Translation

2021-11-01EMNLP 2021Code Available1· sign in to hype

Dan Liu, Mengge Du, Xiaoxi Li, Ya Li, Enhong Chen

Code Available — Be the first to reproduce this paper.

Code

github.com/danliu2/caat
OfficialIn paperpytorch★ 35

Abstract

This paper proposes a novel architecture, Cross Attention Augmented Transducer (CAAT), for simultaneous translation. The framework aims to jointly optimize the policy and translation models. To effectively consider all possible READ-WRITE simultaneous translation action paths, we adapt the online automatic speech recognition (ASR) model, RNN-T, but remove the strong monotonic constraint, which is critical for the translation task to consider reordering. To make CAAT work, we introduce a novel latency loss whose expectation can be optimized by a forward-backward algorithm. We implement CAAT with Transformer while the general CAAT architecture can also be implemented with other attention-based encoder-decoder frameworks. Experiments on both speech-to-text (S2T) and text-to-text (T2T) simultaneous translation tasks show that CAAT achieves significantly better latency-quality trade-offs compared to the state-of-the-art simultaneous translation approaches.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)Decoder speech-recognition Speech Recognition Speech-to-Text Translation

Cross Attention Augmented Transducer Networks for Simultaneous Translation

Code

Abstract

Tasks

Reproductions