Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

2021-07-02Code Available1· sign in to hype

Timo Lohrenz, Patrick Schwarz, Zhengyang Li, Tim Fingscheidt

Code Available — Be the first to reproduce this paper.

Code

github.com/freewym/espresso
OfficialIn paperpytorch★ 940

Abstract

Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of relaxed attention, which is a simple gradual injection of a uniform distribution to the encoder-decoder attention weights during training that is easily implemented with two lines of code. We investigate the effect of relaxed attention across different AED model architectures and two prominent ASR tasks, Wall Street Journal (WSJ) and Librispeech. We found that transformers trained with relaxed attention outperform the standard baseline models consistently during decoding with external language models. On WSJ, we set a new benchmark for transformer-based end-to-end speech recognition with a word error rate of 3.65%, outperforming state of the art (4.20%) by 13.1% relative, while introducing only a single hyperparameter.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)Decoder speech-recognition Speech Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
LibriSpeech test-other	Conformer with Relaxed Attention	Word Error Rate (WER)	6.85	—	Unverified
WSJ eval92	Transformer with Relaxed Attention	Word Error Rate (WER)	3.19	—	Unverified

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Code

Abstract

Tasks

Benchmark Results

Reproductions