SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 30013050 of 6433 papers

TitleStatusHype
Improving noise robust automatic speech recognition with single-channel time-domain enhancement network0
Improving Noise Robustness of an End-to-End Neural Model for Automatic Speech Recognition0
Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning0
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction0
Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging0
Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models0
Character-aware audio-visual subtitling in context0
Adversarial Joint Training with Self-Attention Mechanism for Robust End-to-End Speech Recognition0
A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation0
Improving Proper Noun Recognition in End-to-End ASR By Customization of the MWER Loss Criterion0
Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask0
Improving Punctuation Restoration for Speech Transcripts via External Data0
Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition0
Improving Readability for Automatic Speech Recognition Transcription0
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data0
Improving RNN-T ASR Accuracy Using Context Audio0
Improving RNN-T ASR Performance with Date-Time and Location Awareness0
Fashioning Local Designs from Generic Speech Technologies in an Australian Aboriginal Community0
Character-Aware Attention-Based End-to-End Speech Recognition0
Improving RNN transducer with normalized jointer network0
Improving Robustness of Neural Inverse Text Normalization via Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method0
Improving Scheduled Sampling for Neural Transducer-based ASR0
FARMI: A FrAmework for Recording Multi-Modal Interactions0
Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses0
Character and Subword-Based Word Representation for Neural Language Modeling Prediction0
A Probabilistic Framework for Representing Dialog Systems and Entropy-Based Dialog Management through Dynamic Stochastic State Evolution0
Falling silent, lost for words ... Tracing personal involvement in interviews with Dutch war veterans0
Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs0
Improving Speech Recognition Accuracy of Local POI Using Geographical Models0
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit0
Chaotic Variational Auto encoder-based Adversarial Machine Learning0
Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers0
Improving Speech Recognition for African American English With Audio Classification0
Improving Speech Recognition for Indic Languages using Language Model0
Improving Speech Recognition for the Elderly: A New Corpus of Elderly Japanese Speech and Investigation of Acoustic Modeling for Speech Recognition0
Improving speech recognition models with small samples for air traffic control systems0
Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN0
Improving Speech-to-Speech Translation Through Unlabeled Text0
Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech0
Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data0
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies0
Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning0
FairLENS: Assessing Fairness in Law Enforcement Speech Recognition0
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target0
Improving the fusion of acoustic and text representations in RNN-T0
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning0
CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi0
Improving the Intent Classification accuracy in Noisy Environment0
Improving the Interpretability of Deep Neural Networks with Knowledge Distillation0
A Probabilistic Approach for Confidence Scoring in Speech Recognition0
Show:102550
← PrevPage 61 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9Gated ConvNetsWord Error Rate (WER)4.8Unverified
10HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
8DNN MPEPercentage error12.9Unverified
9DNN MMIPercentage error12.9Unverified
10CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWBPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified