SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 33013350 of 6433 papers

TitleStatusHype
4-bit Quantization of LSTM-based Speech Recognition Models0
Task-aware Warping Factors in Mask-based Speech Enhancement0
Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching0
Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement0
Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network0
Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR0
Reducing Exposure Bias in Training Recurrent Neural Network Transducers0
Graph Neural Networks: Methods, Applications, and Opportunities0
Subject Envelope based Multitype Reconstruction Algorithm of Speech Samples of Parkinson's Disease0
A Unified Transformer-based Framework for Duplex Text Normalization0
Automatic Speech Recognition And Limited Vocabulary: A Survey0
Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers0
Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer0
A Dual-Decoder Conformer for Multilingual Speech Recognition0
Hierarchical Summarization for Longform Spoken Dialog0
A Multi-level Acoustic Feature Extraction Framework for Transformer Based End-to-End Speech Recognition0
A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems0
DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants0
Multilingual training set selection for ASR in under-resourced Malian languages0
Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition0
StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition0
End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive EnvelopesCode0
Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition0
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation0
An empirical assessment of deep learning approaches to task-oriented dialog management0
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading0
Out-of-Domain Generalization from a Single Source: An Uncertainty Quantification Approach0
Fast frequency modulation is encoded according to the listener expectations in the human subcortical auditory pathway0
Improving Distinction between ASR Errors and Speech Disfluencies with Feature Space Interpolation0
Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification0
Unsupervised Domain Adaptation in Speech Recognition using Phonetic Features0
Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-Temporal Sparsity0
Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation0
Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization0
Learning a Neural Diff for Speech Models0
Amortized Neural Networks for Low-Latency Speech Recognition0
The Role of Phonetic Units in Speech Emotion Recognition0
Adversarial Data Augmentation for Disordered Speech Recognition0
Decoupling recognition and transcription in Mandarin ASR0
Automatic recognition of suprasegmentals in speech0
MOHAQ: Multi-Objective Hardware-Aware Quantization of Recurrent Neural Networks0
User-Initiated Repetition-Based Recovery in Multi-Utterance Dialogue Systems0
On Knowledge Distillation for Translating Erroneous Speech Transcriptions0
基于改进Conformer的新闻领域端到端语音识别(End-to-End Speech Recognition in News Field based on Conformer)0
Avengers, Ensemble! Benefits of ensembling in grapheme-to-phoneme prediction0
KIT’s IWSLT 2021 Offline Speech Translation System0
Interactive Reinforcement Learning for Table Balancing Robot0
Automatic generation of a 3D sign language avatar on AR glasses given 2D videos of human signers0
QASR: QCRI Aljazeera Speech Resource A Large Scale Annotated Arabic Speech Corpus0
IMS’ Systems for the IWSLT 2021 Low-Resource Speech Translation Task0
Show:102550
← PrevPage 67 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9Gated ConvNetsWord Error Rate (WER)4.8Unverified
10HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
8DNN MPEPercentage error12.9Unverified
9DNN MMIPercentage error12.9Unverified
10CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWBPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified