Speech Recognition
Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.
( Image credit: SpecAugment )
Papers
Showing 1–10 of 6433 papers
All datasetsLibriSpeech test-cleanLibriSpeech test-otherSwitchboard + Hub500TIMITAISHELL-1WSJ eval92Common Voice Germanswb_hub_500 WER fullSWBCHTUDACommon Voice FrenchCommon Voice SpanishMediaSpeech
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LSNN | Percentage error | 33.2 | — | Unverified |
| 2 | LAS multitask with indicators sampling | Percentage error | 20.4 | — | Unverified |
| 3 | Soft Monotonic Attention (ours, offline) | Percentage error | 20.1 | — | Unverified |
| 4 | QCNN-10L-256FM | Percentage error | 19.64 | — | Unverified |
| 5 | Bi-LSTM + skip connections w/ CTC | Percentage error | 17.7 | — | Unverified |
| 6 | Bi-RNN + Attention | Percentage error | 17.6 | — | Unverified |
| 7 | RNN-CRF on 24(x3) MFSC | Percentage error | 17.3 | — | Unverified |
| 8 | Light Gated Recurrent Units | Percentage error | 16.7 | — | Unverified |
| 9 | CNN in time and frequency + dropout, 17.6% w/o dropout | Percentage error | 16.7 | — | Unverified |
| 10 | GRU | Percentage error | 16.6 | — | Unverified |