Deep Speech: Scaling up end-to-end speech recognition

2014-12-17Code Available1· sign in to hype

Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/Picovoice/stt-benchmark
none★ 687
github.com/Picovoice/speech-to-text-benchmark
none★ 687
github.com/robmsmt/KerasDeepSpeech
tf★ 243
github.com/GeorgeFedoseev/DeepSpeech
tf★ 83
github.com/bjtommychen/Keras_DeepSpeech2_SpeechRecognition
tf★ 3
github.com/YuBeomGon/DeepSpeech
tf★ 0
github.com/soarsmu/crossasr
paddle★ 0
github.com/WalterJohnson0/DeepSpeech-KerasRebuild
tf★ 0
github.com/Digital-Umuganda/Deepspeech-Kinyarwanda
tf★ 0
github.com/IBM/MAX-Speech-to-Text-Converter
tf★ 0

Abstract

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

Tasks

Accented Speech Recognition Speech Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
VoxForge American-Canadian	Deep Speech	Percentage error	15.01	—	Unverified
VoxForge Commonwealth	Deep Speech	Percentage error	28.46	—	Unverified
VoxForge European	Deep Speech	Percentage error	31.2	—	Unverified
VoxForge Indian	Deep Speech	Percentage error	45.35	—	Unverified

Deep Speech: Scaling up end-to-end speech recognition

Code

Abstract

Tasks

Benchmark Results

Reproductions