SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 41014150 of 6433 papers

TitleStatusHype
Learning of Time-Frequency Attention Mechanism for Automatic Modulation Recognition0
Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors0
Learning Online Alignments with Continuous Rewards Policy Gradient0
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation0
Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization0
Learning Representations of Affect from Speech0
Learning Robust and Multilingual Speech Representations0
Learning Robust Dialog Policies in Noisy Environments0
Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech0
Learning Shared Encoding Representation for End-to-End Speech Recognition Models0
Learning Similarity Functions for Pronunciation Variations0
Learning Speech Rate in Speech Recognition0
Learning The Sequential Temporal Information with Recurrent Neural Networks0
Learning the Taxonomy of Function Words for Parsing0
Learning To Detect Keyword Parts And Whole By Smoothed Max Pooling0
Learning to Distill: The Essence Vector Modeling Framework0
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition0
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition0
Learning to Rank Intents in Voice Assistants0
Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition0
Learning to retrieve out-of-vocabulary words in speech recognition0
Learning When to Trust Which Teacher for Weakly Supervised ASR0
Learning with Inadequate and Incorrect Supervision0
Learning with Limited Samples -- Meta-Learning and Applications to Communication Systems0
Learning with Noise-Contrastive Estimation: Easing training by learning to scale0
Learning without Forgetting: Task Aware Multitask Learning for Multi-Modality Tasks0
Learning Word-Level Confidence For Subword End-to-End ASR0
Learning Word-Like Units from Joint Audio-Visual Analysis0
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition0
Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data0
LeBLEU: N-gram-based Translation Evaluation Score for Morphologically Complex Languages0
Lecture Translator - Speech translation framework for simultaneous lecture translation0
Lego-Features: Exporting modular encoder features for streaming and deliberation ASR0
LegoNN: Building Modular Encoder-Decoder Models0
Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency0
Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations0
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data0
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging0
LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models0
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding0
Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification0
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR0
運用概念模型化技術於中文大詞彙連續語音辨識之語言模型調適 (Leveraging Concept Modeling Techniques for Language Model Adaptation in Mandarin Large Vocabulary Continuous Speech Recognition) [In Chinese]0
Leveraging Cross-Utterance Context For ASR Decoding0
Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition0
Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition0
Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise0
Leveraging Effective Query Modeling Techniques for Speech Recognition and Summarization0
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec0
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yol\'oxochitl Mixtec0
Show:102550
← PrevPage 83 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
10Gated ConvNetsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7DNN MPEPercentage error12.9Unverified
8DNN MMIPercentage error12.9Unverified
9HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
10HMM-DNN +sMBRPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
7TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified