Using Spoken Word Posterior Features in Neural Machine Translation

2018-10-01IWSLT (EMNLP) 2018Unverified0· sign in to hype

Kaho Osamura, Takatomo Kano, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

Unverified — Be the first to reproduce this paper.

Abstract

A spoken language translation (ST) system consists of at least two modules: an automatic speech recognition (ASR) system and a machine translation (MT) system. In most cases, an MT is only trained and optimized using error-free text data. If the ASR makes errors, the translation accuracy will be greatly reduced. Existing studies have shown that training MT systems with ASR parameters or word lattices can improve the translation quality. However, such an extension requires a large change in standard MT systems, resulting in a complicated model that is hard to train. In this paper, a neural sequence-to-sequence ASR is used as feature processing that is trained to produce word posterior features given spoken utterances. The resulting probabilistic features are used to train a neural MT (NMT) with only a slight modification. Experimental results reveal that the proposed method improved up to 5.8 BLEU scores with synthesized speech or 4.3 BLEU scores with the natural speech in comparison with a conventional cascaded-based ST system that translates from the 1-best ASR candidates.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)Machine Translation NMT speech-recognition Speech Recognition Translation

Using Spoken Word Posterior Features in Neural Machine Translation

Abstract

Tasks

Reproductions