SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 27012750 of 6433 papers

TitleStatusHype
Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition0
homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition0
Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling0
Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition0
Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT0
Granary: Speech Recognition and Translation Dataset in 25 European Languages0
Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition0
Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter0
Graph Databases for Designing High-Performance Speech Recognition Grammars0
Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding0
Bridging the Gap between Spatial and Spectral Domains: A Survey on Graph Neural Networks0
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling0
GRASS: the Graz corpus of Read And Spontaneous Speech0
Comparison of echo state network output layer classification methods on noisy data0
Grouping Language Model Boundary Words to Speed K--Best Extraction from Hypergraphs0
Grow and Prune Compact, Fast, and Accurate LSTMs0
Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary0
Guided contrastive self-supervised pre-training for automatic speech recognition0
Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR0
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance0
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation0
Comparison of Neural Network Architectures for Spectrum Sensing0
Gujarati-English Code-Switching Speech Recognition using ensemble prediction of spoken language0
Hallucination of speech recognition errors with sequence to sequence learning0
Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models0
Halving transcription time: A fast, user-friendly and GDPR-compliant workflow to create AI-assisted transcripts for content analysis0
Handling and extracting key entities from customer conversations using Speech recognition and Named Entity recognition0
Handwriting recognition for Scottish Gaelic0
Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling0
Hard Sample Mining for the Improved Retraining of Automatic Speech Recognition0
Another Point of View on Visual Speech Recognition0
Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM0
Hardware Implementation of Hyperbolic Tangent Function using Catmull-Rom Spline Interpolation0
A Speech-enabled Fixed-phrase Translator for Healthcare Accessibility0
HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning0
Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models0
Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss0
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition0
Harnessing Transfer Learning from Swahili: Advancing Solutions for Comorian Dialects0
HASP: A High-Performance Adaptive Mobile Security Enhancement Against Malicious Speech Recognition0
HausaNLP: Current Status, Challenges and Future Directions for Hausa Natural Language Processing0
Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition0
Head-synchronous Decoding for Transformer-based Streaming ASR0
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers0
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models0
Hearings and mishearings: decrypting the spoken word0
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding0
Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems0
Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning0
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap0
Show:102550
← PrevPage 55 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
10Gated ConvNetsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7DNN MPEPercentage error12.9Unverified
8DNN MMIPercentage error12.9Unverified
9HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
10HMM-DNN +sMBRPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
7TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified