Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation

2022-05-01ACL 2022Unverified0· sign in to hype

Ruiqing Zhang, Zhongjun He, Hua Wu, Haifeng Wang

Unverified — Be the first to reproduce this paper.

Abstract

End-to-end simultaneous speech-to-text translation aims to directly perform translation from streaming source speech to target text with high translation quality and low latency. A typical simultaneous translation (ST) system consists of a speech translation model and a policy module, which determines when to wait and when to translate. Thus the policy is crucial to balance translation quality and latency. Conventional methods usually adopt fixed policies, e.g. segmenting the source speech with a fixed length and generating translation. However, this method ignores contextual information and suffers from low translation quality. This paper proposes an adaptive segmentation policy for end-to-end ST. Inspired by human interpreters, the policy learns to segment the source streaming speech into meaningful units by considering both acoustic features and translation history, maintaining consistency between the segmentation and translation. Experimental results on English-German and Chinese-English show that our method achieves a good accuracy-latency trade-off over recently proposed state-of-the-art methods.

Tasks

Segmentation Simultaneous Speech-to-Text Translation Speech-to-Text Speech-to-Text Translation Translation

Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation

Abstract

Tasks

Reproductions