Learning to Use Future Information in Simultaneous Translation

2021-01-01Code Available0· sign in to hype

Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Tao Qin, Tie-Yan Liu

Code Available — Be the first to reproduce this paper.

Code

github.com/P2F-research/simulNMT
OfficialIn paperpytorch★ 1

Abstract

Simultaneous neural machine translation (briefly, NMT) has attracted much attention recently. In contrast to standard NMT, where the NMT system can access the full input sentence, simultaneous NMT is a prefix-to-prefix problem, where the system can only utilize the prefix of the input sentence and thus more uncertainty and difficulty are introduced to decoding. Wait-k inference is a simple yet effective strategy for simultaneous NMT, where the decoder generates the output sequence k words behind the input words. For wait-k inference, we observe that wait-m training with m>k in simultaneous NMT (i.e., using more future information for training than inference) generally outperforms wait-k training. Based on this observation, we propose a method that automatically learns how much future information to use in training for simultaneous NMT. Specifically, we introduce a controller to adaptively select wait-m training strategies according to the network status of the translation model and current training sentence pairs, and the controller is jointly trained with the translation model through bi-level optimization. Experiments on four datasets show that our method brings 1 to 3 BLEU point improvement over baselines under the same latency. Our code is available at https://github.com/P2F-research/simulNMT .

Tasks

Decoder Machine Translation NMT Sentence Translation

Learning to Use Future Information in Simultaneous Translation

Code

Abstract

Tasks

Reproductions