SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 34513500 of 6433 papers

TitleStatusHype
Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts0
Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition0
Improving RNN-T ASR Performance with Date-Time and Location Awareness0
Leveraging Pre-trained Language Model for Speech Sentiment Analysis0
TASK AWARE MULTI-TASK LEARNING FOR SPEECH TO TEXT TASKS0
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition0
Balanced End-to-End Monolingual pre-training for Low-Resourced Indic Languages Code-Switching Speech Recognition0
U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition0
A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition0
Unsupervised Automatic Speech Recognition: A Review0
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition0
Sequential End-to-End Intent and Slot Label Classification and Localization0
Muddling Label Regularization: Deep Learning for Tabular DatasetsCode0
Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios0
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement0
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition0
Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio0
Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for End Usability0
A Discussion On the Validity of Manifold Learning0
Improving low-resource ASR performance with untranscribed out-of-domain data0
Dual Script E2E framework for Multilingual and Code-Switching ASR0
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition0
Evaluating Automatic Speech Recognition Quality and Its Impact on Counselor Utterance Coding0
End-to-end ASR to jointly predict transcriptions and linguistic annotations0
End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec0
Multilingual Speech Translation with Unified Transformer: Huawei Noah's Ark Lab at IWSLT 20210
Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation0
A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data0
Developing ASR for Indonesian-English Bilingual Language Teaching0
2020福爾摩沙臺語語音辨識比賽之初步實驗 (A Preliminary Study of Formosa Speech Recognition Challenge 2020 – Taiwanese ASR)0
Language ID Prediction from Speech Using Self-Attentive Pooling0
NSYSU-MITLab團隊於福爾摩沙語音辨識競賽2020之語音辨識系統 (NSYSU-MITLab Speech Recognition System for Formosa Speech Recognition Challenge 2020)0
Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions0
Fine-grained Generalization Analysis of Structured Output Prediction0
Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR0
Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning Based Methods0
Quantization and Deployment of Deep Neural Networks on MicrocontrollersCode0
Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition0
Training Speech Enhancement Systems with Noisy Speech Datasets0
Unsupervised Speech Recognition0
Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries0
A Streaming End-to-End Framework For Spoken Language Understanding0
Exploiting Adapters for Cross-lingual Low-resource Speech RecognitionCode0
LiSTra, Automatic Speech Translation: English to Lingala case study0
Hardware Synthesis of State-Space Equations; Application to FPGA Implementation of Shallow and Deep Neural NetworksCode0
Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation0
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End0
Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition0
Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition0
StutterNet: Stuttering Detection Using Time Delay Neural Network0
Show:102550
← PrevPage 70 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
10Gated ConvNetsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7DNN MPEPercentage error12.9Unverified
8DNN MMIPercentage error12.9Unverified
9HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
10HMM-DNN +sMBRPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
7TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified