Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3276–3300 of 6433 papers

Title	Date	Tasks	Status
The Future of Spoken Dialogue Systems is in their Past: Long-Term Adaptive, Conversational Assistants	Jun 1, 2012	Language ModellingSpeech Recognition	—Unverified
The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning	Sep 29, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The GUA-Speech System Description for CNVSRC Challenge 2023	Dec 12, 2023	DecoderLanguage Modeling	—Unverified
The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis	Aug 1, 2013	Dependency ParsingFeature Engineering	—Unverified
The Herme Database of Spontaneous Multimodal Human-Robot Dialogues	May 1, 2012	Gesture RecognitionSpeech Recognition	—Unverified
The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches	Mar 3, 2018	Deep LearningDeep Reinforcement Learning	—Unverified
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation	Aug 9, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation	May 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The IBM 2015 English Conversational Telephone Speech Recognition System	May 21, 2015	Language ModelingLanguage Modelling	—Unverified
The IBM 2016 English Conversational Telephone Speech Recognition System	Apr 27, 2016	Language ModelingLanguage Modelling	—Unverified
The IBM 2016 Speaker Recognition System	Feb 23, 2016	2kAutomatic Speech Recognition	—Unverified
The IBM Speaker Recognition System: Recent Advances and Error Analysis	May 5, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The ILMT-s2s Corpus â€• A Multimodal Interlingual Map Task Corpus	May 1, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR	Mar 30, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Importance of Recommender and Feedback Features in a Pronunciation Learning Aid	Jul 1, 2018	Information RetrievalRecommendation Systems	—Unverified
The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software	Dec 1, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The InproTK 2012 release	Jun 1, 2012	Dialogue ManagementSpeech Recognition	—Unverified
The IOIT English ASR system for IWSLT 2016	Dec 1, 2016	Language ModelingLanguage Modelling	—Unverified
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation	May 1, 2012	Machine Translationspeech-recognition	—Unverified
The IWSLT 2016 Evaluation Campaign	Dec 1, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The IWSLT 2019 KIT Speech Translation System	Nov 1, 2019	speech-recognitionSpeech Recognition	—Unverified
The IWSLT 2021 BUT Speech Translation Systems	Jul 13, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge	Jun 14, 2020	Action DetectionActivity Detection	—Unverified
The KIT Lecture Corpus for Speech Translation	May 1, 2012	Speech RecognitionTranslation	—Unverified
The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities	Jun 1, 2014	Semantic Parsingspeech-recognition	—Unverified

Show:10 25 50

← PrevPage 132 of 258Next →

All datasets LibriSpeech test-clean LibriSpeech test-other Switchboard + Hub500 TIMIT AISHELL-1 WSJ eval92 Common Voice German swb_hub_500 WER fullSWBCH TUDA Common Voice French Common Voice Spanish MediaSpeech

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AmNet	Word Error Rate (WER)	8.6	—	Unverified
2	HMM-(SAT)GMM	Word Error Rate (WER)	8	—	Unverified
3	Local Prior Matching (Large Model)	Word Error Rate (WER)	7.19	—	Unverified
4	Snips	Word Error Rate (WER)	6.4	—	Unverified
5	Li-GRU	Word Error Rate (WER)	6.2	—	Unverified
6	HMM-DNN + pNorm*	Word Error Rate (WER)	5.5	—	Unverified
7	CTC + policy learning	Word Error Rate (WER)	5.42	—	Unverified
8	Deep Speech 2	Word Error Rate (WER)	5.33	—	Unverified
9	HMM-TDNN + iVectors	Word Error Rate (WER)	4.8	—	Unverified
10	Gated ConvNets	Word Error Rate (WER)	4.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Local Prior Matching (Large Model)	Word Error Rate (WER)	20.84	—	Unverified
2	Snips	Word Error Rate (WER)	16.5	—	Unverified
3	Local Prior Matching (Large Model, ConvLM LM)	Word Error Rate (WER)	15.28	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	13.25	—	Unverified
5	TDNN + pNorm + speed up/down speech	Word Error Rate (WER)	12.5	—	Unverified
6	CTC-CRF 4gram-LM	Word Error Rate (WER)	10.65	—	Unverified
7	Convolutional Speech Recognition	Word Error Rate (WER)	10.47	—	Unverified
8	MT4SSL	Word Error Rate (WER)	9.6	—	Unverified
9	Jasper DR 10x5	Word Error Rate (WER)	8.79	—	Unverified
10	Espresso	Word Error Rate (WER)	8.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Deep Speech	Percentage error	20	—	Unverified
2	DNN-HMM	Percentage error	18.5	—	Unverified
3	CD-DNN	Percentage error	16.1	—	Unverified
4	DNN	Percentage error	16	—	Unverified
5	DNN + Dropout	Percentage error	15	—	Unverified
6	DNN BMMI	Percentage error	12.9	—	Unverified
7	DNN MPE	Percentage error	12.9	—	Unverified
8	DNN MMI	Percentage error	12.9	—	Unverified
9	HMM-TDNN + pNorm + speed up/down speech	Percentage error	12.9	—	Unverified
10	HMM-DNN +sMBR	Percentage error	12.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSNN	Percentage error	33.2	—	Unverified
2	LAS multitask with indicators sampling	Percentage error	20.4	—	Unverified
3	Soft Monotonic Attention (ours, offline)	Percentage error	20.1	—	Unverified
4	QCNN-10L-256FM	Percentage error	19.64	—	Unverified
5	Bi-LSTM + skip connections w/ CTC	Percentage error	17.7	—	Unverified
6	Bi-RNN + Attention	Percentage error	17.6	—	Unverified
7	RNN-CRF on 24(x3) MFSC	Percentage error	17.3	—	Unverified
8	CNN in time and frequency + dropout, 17.6% w/o dropout	Percentage error	16.7	—	Unverified
9	Light Gated Recurrent Units	Percentage error	16.7	—	Unverified
10	GRU	Percentage error	16.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Att	Word Error Rate (WER)	18.7	—	Unverified
2	CTC/Att	Word Error Rate (WER)	6.7	—	Unverified
3	BRA-E	Word Error Rate (WER)	6.63	—	Unverified
4	CTC-CRF 4gram-LM	Word Error Rate (WER)	6.34	—	Unverified
5	BAT	Word Error Rate (WER)	4.97	—	Unverified
6	Paraformer	Word Error Rate (WER)	4.95	—	Unverified
7	U2	Word Error Rate (WER)	4.72	—	Unverified
8	UMA	Word Error Rate (WER)	4.7	—	Unverified
9	Lightweight Transducer	Word Error Rate (WER)	4.31	—	Unverified
10	CIF-HKD With LM	Word Error Rate (WER)	4.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Jasper 10x3	Word Error Rate (WER)	6.9	—	Unverified
2	CNN over RAW speech (wav)	Word Error Rate (WER)	5.6	—	Unverified
3	CTC-CRF 4gram-LM	Word Error Rate (WER)	3.79	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	3.6	—	Unverified
5	test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*	Word Error Rate (WER)	3.6	—	Unverified
6	Convolutional Speech Recognition	Word Error Rate (WER)	3.5	—	Unverified
7	TC-DNN-BLSTM-DNN	Word Error Rate (WER)	3.5	—	Unverified
8	Espresso	Word Error Rate (WER)	3.4	—	Unverified
9	CTC-CRF VGG-BLSTM	Word Error Rate (WER)	3.2	—	Unverified
10	Transformer with Relaxed Attention	Word Error Rate (WER)	3.19	—	Unverified