SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 30013050 of 6433 papers

TitleStatusHype
RED-ACE: Robust Error Detection for ASR using Confidence Embeddings0
Recent Progress in the CUHK Dysarthric Speech Recognition System0
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition0
Investigation of Data Augmentation Techniques for Disordered Speech Recognition0
The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition0
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition0
A Likelihood Ratio based Domain Adaptation Method for E2E Models0
Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection0
Two-Pass End-to-End ASR Model Compression0
Neural Architecture Search For LF-MMI Trained Time Delay Neural NetworksCode0
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset0
Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition0
Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question0
Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions0
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation0
Temporal Attention Augmented Transformer Hawkes Process0
Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech0
Multi-Dialect Arabic Speech Recognition0
Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition0
TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue Modeling on Spoken Conversations0
VoiceMoji: A Novel On-Device Pipeline for Seamless Emoji Insertion in Dictation0
Voice Quality and Pitch Features in Transformer-Based Speech Recognition0
Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks0
Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching0
Multi-turn RNN-T for streaming recognition of multi-party speech0
Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition0
Continual Learning for Monolingual End-to-End Automatic Speech RecognitionCode0
A singular Riemannian geometry approach to Deep Neural Networks I. Theoretical foundations0
Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems0
On the Use of External Data for Spoken Named Entity RecognitionCode0
Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Model0
Robustifying automatic speech recognition by extracting slowly varying features0
ImportantAug: a data augmentation agent for speechCode0
Real-Time Neural Voice Camouflage0
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition0
Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN0
Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks0
Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech0
Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems0
Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition0
Sequence-level self-learning with multiple hypotheses0
Are E2E ASR models ready for an industrial usage?0
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading0
A study on native American English speech recognition by Indian listeners with varying word familiarity level0
BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge0
Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking0
A Mixture of Expert Based Deep Neural Network for Improved ASR0
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent0
A higher order Minkowski loss for improved prediction ability of acoustic model in ASR0
An End-to-End Speech Recognition for the Nepali Language0
Show:102550
← PrevPage 61 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9Gated ConvNetsWord Error Rate (WER)4.8Unverified
10HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
8DNN MPEPercentage error12.9Unverified
9DNN MMIPercentage error12.9Unverified
10CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWBPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified