Robust Speech Recognition via Large-Scale Weak Supervision

2022-12-06Preprint 2022Code Available8· sign in to hype

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever

Code Available — Be the first to reproduce this paper.

Code

github.com/openai/whisper
OfficialIn paperpytorch★ 96,448
github.com/ggerganov/whisper.cpp
none★ 47,845
github.com/m-bain/whisperx
pytorch★ 20,861
github.com/sanchit-gandhi/whisper-jax
jax★ 4,689
github.com/whisperspeech/whisperspeech
pytorch★ 4,576
github.com/collabora/whisperspeech
pytorch★ 4,576
github.com/collabora/whisperlive
pytorch★ 3,909
github.com/kadirnar/whisper-plus
pytorch★ 1,938
github.com/briansidp/whisperbiasing
pytorch★ 86
github.com/audioshake/alt-eval
none★ 43

Abstract

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

Tasks

Robust Speech Recognition speech-recognition Speech Recognition Speech-to-Speech Translation Zero-Shot Audio Retrieval

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Common Voice English	Whisper (Large v2)	Word Error Rate (WER)	9.4	—	Unverified
Common Voice French	Whisper (Large v2)	Test WER	13.9	—	Unverified
Common Voice German	Whisper (Large v2)	Test WER	6.4	—	Unverified
Common Voice Italian	Whisper (Large v2)	Test WER	7.1	—	Unverified
Common Voice Japanese	Whisper (Large v2)	Test WER	9.1	—	Unverified
Common Voice Russian	Whisper (Large v2)	Test WER	7.1	—	Unverified
Common Voice Spanish	Whisper (Large v2)	Test WER	5.6	—	Unverified

Robust Speech Recognition via Large-Scale Weak Supervision

Code

Abstract

Tasks

Benchmark Results

Reproductions