SOTAVerified

Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Showing 37013750 of 6433 papers

TitleStatusHype
Progress in Multilingual Speech Recognition for Low Resource Languages Kurmanji Kurdish, Cree and Inuktut0
Progressive Down-Sampling for Acoustic Encoding0
Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition0
Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks0
Progressive Multi-Scale Self-Supervised Learning for Speech Recognition0
Progressive Residual Extraction based Pre-training for Speech Representation Learning0
Progressive unsupervised domain adaptation for ASR using ensemble models and multi-stage training0
Projection of Turn Completion in Incremental Spoken Dialogue Systems0
Prompt-based Content Scoring for Automated Spoken Language Assessment0
Promptformer: Prompted Conformer Transducer for ASR0
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition0
Prompting Large Language Models with Speech Recognition Abilities0
Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection0
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition0
PronouncUR: An Urdu Pronunciation Lexicon Generator0
Pronunciation Adaptation For Disordered Speech Recognition Using State-Specific Vectors of Phone-Cluster Adaptive Training0
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition0
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations0
Pronunciation Generation for Foreign Language Words in Intra-Sentential Code-Switching Speech Recognition0
Pronunciation Modeling of Foreign Words for Mandarin ASR by Considering the Effect of Language Transfer0
Pronunciation recognition of English phonemes /@/, /æ/, /A:/ and /2/ using Formants and Mel Frequency Cepstral Coefficients0
Pronunciation Variants and ASR of Colloquial Speech: A Case Study on Czech0
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases0
Prosomarker: a prosodic analysis tool based on optimal pitch stylization and automatic syllabi fication0
Protecting gender and identity with disentangled speech representations0
Pruned RNN-T for fast, memory-efficient ASR training0
Pseudo-Labeling for Massively Multilingual Speech Recognition0
Pseudo Label Is Better Than Human Label0
PSRB: A Comprehensive Benchmark for Evaluating Persian ASR Systems0
Punctuation Prediction for Polish Texts using Transformers0
Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?0
Punctuation Prediction with Transition-based Parsing0
SemEval 2022 Task 12: Symlink- Linking Mathematical Symbols to their Descriptions0
Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning0
Purely sequence-trained neural networks for ASR based on lattice-free MMI0
Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs0
Pushing the Limits of Non-Autoregressive Speech Recognition0
PyDial: A Multi-domain Statistical Dialogue System Toolkit0
Pynini: A Python library for weighted finite-state grammar compilation0
PyOpenDial: A Python-based Domain-Independent Toolkit for Developing Spoken Dialogue Systems with Probabilistic Rules0
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition0
運用Python結合語音辨識及合成技術於自動化音文同步之實作(A Python Implementation of Automatic Speech-text Synchronization Using Speech Recognition and Text-to-Speech Technology)[In Chinese]0
QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus0
QASR: QCRI Aljazeera Speech Resource A Large Scale Annotated Arabic Speech Corpus0
QCRI Live Speech Translation System0
Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations0
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition0
Qualitative investigation of the display of speech recognition results for communication with deaf people0
Quality Estimation for Automatic Speech Recognition0
Quantification of stylistic differences in human- and ASR-produced transcripts of African American English0
Show:102550
← PrevPage 75 of 129Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AmNetWord Error Rate (WER)8.6Unverified
2HMM-(SAT)GMMWord Error Rate (WER)8Unverified
3Local Prior Matching (Large Model)Word Error Rate (WER)7.19Unverified
4SnipsWord Error Rate (WER)6.4Unverified
5Li-GRUWord Error Rate (WER)6.2Unverified
6HMM-DNN + pNorm*Word Error Rate (WER)5.5Unverified
7CTC + policy learningWord Error Rate (WER)5.42Unverified
8Deep Speech 2Word Error Rate (WER)5.33Unverified
9HMM-TDNN + iVectorsWord Error Rate (WER)4.8Unverified
10Gated ConvNetsWord Error Rate (WER)4.8Unverified
#ModelMetricClaimedVerifiedStatus
1Local Prior Matching (Large Model)Word Error Rate (WER)20.84Unverified
2SnipsWord Error Rate (WER)16.5Unverified
3Local Prior Matching (Large Model, ConvLM LM)Word Error Rate (WER)15.28Unverified
4Deep Speech 2Word Error Rate (WER)13.25Unverified
5TDNN + pNorm + speed up/down speechWord Error Rate (WER)12.5Unverified
6CTC-CRF 4gram-LMWord Error Rate (WER)10.65Unverified
7Convolutional Speech RecognitionWord Error Rate (WER)10.47Unverified
8MT4SSLWord Error Rate (WER)9.6Unverified
9Jasper DR 10x5Word Error Rate (WER)8.79Unverified
10EspressoWord Error Rate (WER)8.7Unverified
#ModelMetricClaimedVerifiedStatus
1Deep SpeechPercentage error20Unverified
2DNN-HMMPercentage error18.5Unverified
3CD-DNNPercentage error16.1Unverified
4DNNPercentage error16Unverified
5DNN + DropoutPercentage error15Unverified
6DNN BMMIPercentage error12.9Unverified
7DNN MPEPercentage error12.9Unverified
8DNN MMIPercentage error12.9Unverified
9HMM-TDNN + pNorm + speed up/down speechPercentage error12.9Unverified
10HMM-DNN +sMBRPercentage error12.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSNNPercentage error33.2Unverified
2LAS multitask with indicators samplingPercentage error20.4Unverified
3Soft Monotonic Attention (ours, offline)Percentage error20.1Unverified
4QCNN-10L-256FMPercentage error19.64Unverified
5Bi-LSTM + skip connections w/ CTCPercentage error17.7Unverified
6Bi-RNN + AttentionPercentage error17.6Unverified
7RNN-CRF on 24(x3) MFSCPercentage error17.3Unverified
8CNN in time and frequency + dropout, 17.6% w/o dropoutPercentage error16.7Unverified
9Light Gated Recurrent UnitsPercentage error16.7Unverified
10GRUPercentage error16.6Unverified
#ModelMetricClaimedVerifiedStatus
1AttWord Error Rate (WER)18.7Unverified
2CTC/AttWord Error Rate (WER)6.7Unverified
3BRA-EWord Error Rate (WER)6.63Unverified
4CTC-CRF 4gram-LMWord Error Rate (WER)6.34Unverified
5BATWord Error Rate (WER)4.97Unverified
6ParaformerWord Error Rate (WER)4.95Unverified
7U2Word Error Rate (WER)4.72Unverified
8UMAWord Error Rate (WER)4.7Unverified
9Lightweight TransducerWord Error Rate (WER)4.31Unverified
10CIF-HKD With LMWord Error Rate (WER)4.1Unverified
#ModelMetricClaimedVerifiedStatus
1Jasper 10x3Word Error Rate (WER)6.9Unverified
2CNN over RAW speech (wav)Word Error Rate (WER)5.6Unverified
3CTC-CRF 4gram-LMWord Error Rate (WER)3.79Unverified
4Deep Speech 2Word Error Rate (WER)3.6Unverified
5test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*Word Error Rate (WER)3.6Unverified
6Convolutional Speech RecognitionWord Error Rate (WER)3.5Unverified
7TC-DNN-BLSTM-DNNWord Error Rate (WER)3.5Unverified
8EspressoWord Error Rate (WER)3.4Unverified
9CTC-CRF VGG-BLSTMWord Error Rate (WER)3.2Unverified
10Transformer with Relaxed AttentionWord Error Rate (WER)3.19Unverified