A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic Models

2019-10-23Code Available0· sign in to hype

Liang Lu

Code Available — Be the first to reproduce this paper.

Code

github.com/balan/text-to-speech
pytorch★ 7

Abstract

Transformer with self-attention has achieved great success in the area of nature language processing. Recently, there have been a few studies on transformer for end-to-end speech recognition, while its application for hybrid acoustic model is still very limited. In this paper, we revisit the transformer-based hybrid acoustic model, and propose a model structure with interleaved self-attention and 1D convolution, which is proven to have faster convergence and higher recognition accuracy. We also study several aspects of the transformer model, including the impact of the positional encoding feature, dropout regularization, as well as training with and without time restriction. We show competitive recognition results on the public Librispeech dataset when compared to the Kaldi baseline at both cross entropy training and sequence training stages. For reproducible research, we release our source code and recipe within the PyKaldi2 toolbox.

Tasks

speech-recognition Speech Recognition

A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic Models

Code

Abstract

Tasks

Reproductions