Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

2017-06-08Code Available0· sign in to hype

Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan

Code Available — Be the first to reproduce this paper.

Code

github.com/mnm-rnd/elsa-voice-asr
pytorch★ 0
github.com/park-cheol/ASR-Transformer
pytorch★ 0
github.com/s3prl/End-to-end-ASR-Pytorch
pytorch★ 0
github.com/neil-zeng/asr
pytorch★ 0

Abstract

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions, the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10\% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)Decoder General Classification Language Modeling Language Modelling Speech Recognition

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Code

Abstract

Tasks

Reproductions