SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 57515800 of 6433 papers

TitleStatusHype
Segmental Recurrent Neural Networks for End-to-end Speech Recognition0
Adaptive Frequency Cepstral Coefficients for Word Mispronunciation Detection0
The IBM 2016 Speaker Recognition System0
Communication-Efficient Learning of Deep Networks from Decentralized DataCode1
Deep Learning on FPGAs: Past, Present, and Future0
Signer-independent Fingerspelling Recognition with Deep Neural Network Adaptation0
Lipreading with Long Short-Term Memory0
Intelligent Conversational Bot for Massive Online Open Courses (MOOCs)0
Character-Level Incremental Speech Recognition with Recurrent Neural NetworksCode0
Automatic recognition of element classes and boundaries in the birdsong with variable sequences0
Manifold-Kernels Comparison in MKPLS for Visual Speech Recognition0
Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition0
Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model0
Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation0
Using Filter Banks in Convolutional Neural Networks for Texture ClassificationCode0
Evaluating the Performance of a Speech Recognition based System0
Environmental Noise Embeddings for Robust Speech Recognition0
Minimally Supervised Number Normalization0
Sparse Non-negative Matrix Language Modeling0
Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency0
Statistical and Computational Guarantees for the Baum-Welch Algorithm0
Recent Advances in Convolutional Neural Networks0
The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media0
Can Pretrained Neural Networks Detect Anatomy?0
Strategies for Training Large Vocabulary Neural Language ModelsCode0
Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition0
Open Source German Distant Speech Recognition: Corpus and Acoustic ModelCode0
Deep Learning Algorithms with Applications to Video Analytics for A Smart City: A Survey0
Deep Speech 2: End-to-End Speech Recognition in English and MandarinCode1
Deep Learning for Single and Multi-Session i-Vector Speaker Recognition0
THCHS-30 : A Free Chinese Speech CorpusCode0
調變頻譜分解技術於強健語音辨識之研究 (Investigating Modulation Spectrum Factorization Techniques for Robust Speech Recognition) [In Chinese]0
Development of Speech corpora for different Speech Recognition tasks in Malayalam language0
Isolated Word Recognition System for Malayalam using Machine Learning0
Listening With Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines0
Calibrated Structured PredictionCode0
A Short Survey on Data Clustering Algorithms0
Spoken Language Translation for Polish0
Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification0
Transfer Learning for Speech and Language Processing0
Task Loss Estimation for Sequence PredictionCode0
Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition0
Blending LSTMs into CNNs0
Enhancements in statistical spoken language translation by de-normalization of ASR results0
Learning to retrieve out-of-vocabulary words in speech recognition0
Latent Dirichlet Allocation Based Organisation of Broadcast Media Archives for Deep Neural Network Adaptation0
Neural Programmer: Inducing Latent Programs with Gradient Descent0
Learning Representations of Affect from Speech0
Towards Structured Deep Neural Network for Automatic Speech Recognition0
Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition0
Show:102550
← PrevPage 116 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
10Gated ConvNetsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7DNN MPEPercentage error12.9Unverified
8DNN MMIPercentage error12.9Unverified
9HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
10HMM-DNN +sMBRPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
7TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified