SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 43514400 of 6433 papers

TitleStatusHype
Analyzing the impact of speaker localization errors on speech separation for automatic speech recognitionCode0
An Empirical Study of Efficient ASR Rescoring with Transformers0
Recognizing long-form speech using streaming end-to-end models0
A Bayesian Approach to Recurrence in Neural Networks0
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech ToolkitCode0
Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition0
Generative Pre-Training for Speech with Autoregressive Predictive CodingCode1
Analyzing ASR pretraining for low-resource speech-to-text translation0
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model0
RNN based Incremental Online Spoken Language Understanding0
A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic ModelsCode0
A practical two-stage training strategy for multi-stream end-to-end speech recognition0
Efficient Dynamic WFST Decoding for Personalized Language Models0
G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR0
Robust Neural Machine Translation for Clean and Noisy Speech Transcripts0
GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech RecognitionCode1
Improving Transformer-based Speech Recognition Using Unsupervised Pre-trainingCode1
Adversarial Example Detection by Classification for Deep Speech RecognitionCode0
Word-level Embeddings for Cross-Task Transfer Learning in Speech ProcessingCode0
Transformer-based Acoustic Modeling for Hybrid Speech Recognition0
AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks0
Signal Combination for Language Identification0
Predicting ice flow using machine learning0
Neuro-SERKET: Development of Integrative Cognitive System through the Composition of Deep Probabilistic Generative Models0
Indian EmoSpeech Command Dataset: A dataset for emotion based speech recognition in the wildCode0
End-to-End Speech Recognition: A review for the French Language0
Multi-Talker MVDR Beamforming Based on Extended Complex Gaussian Mixture Model0
Detecting Multiple Speech Disfluencies using a Deep Residual Network with Bidirectional Long Short-Term Memory0
LibriVoxDeEn: A Corpus for German-to-English Speech Translation and German Speech RecognitionCode0
Transformer ASR with Contextual Block Processing0
Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition0
Transfer Learning for Algorithm Recommendation0
Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition0
MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition0
vq-wav2vec: Self-Supervised Learning of Discrete Speech RepresentationsCode1
A Research Platform for Multi-Robot Dialogue with Humans0
VAIS ASR: Building a conversational speech recognition system using language model combination0
Query-by-example on-device keyword spotting0
Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems0
One-To-Many Multilingual End-to-end Speech Translation0
A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions0
Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning0
Distributed Learning of Deep Neural Networks using Independent Subnet TrainingCode0
Modeling Confidence in Sequence-to-Sequence Models0
SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition0
Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System0
Convolutional Neural Networks for Speech Controlled Prosthetic Hands0
From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition0
室內遠距離語音辨識實驗(Experiments on In-House Far-Field Speech Recognition)0
探究端對端語音辨識於發音檢測與診斷(Investigating on Computer-Assisted Pronunciation Training Leveraging End-to-End Speech Recognition Techniques)0
Show:102550
← PrevPage 88 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
10Gated ConvNetsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7DNN MPEPercentage error12.9Unverified
8DNN MMIPercentage error12.9Unverified
9HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
10HMM-DNN +sMBRPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
7TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified