Global Normalization for Streaming Speech Recognition in a Modular Framework

2022-05-26Code Available1· sign in to hype

Ehsan Variani, Ke wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen

Code Available — Be the first to reproduce this paper.

Code

github.com/google-research/last
jax★ 47

Abstract

We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50\% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.

Tasks

speech-recognition Speech Recognition

Global Normalization for Streaming Speech Recognition in a Modular Framework

Code

Abstract

Tasks

Reproductions