Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4501–4550 of 6433 papers

Title	Date	Tasks	Status
The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese	Feb 12, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The CAPIO 2017 Conversational Speech Recognition System	Dec 29, 2017	image-classificationImage Classification	—Unverified
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System	Oct 18, 2023	Automatic Speech Recognitionspeaker-diarization	—Unverified
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios	Jun 23, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization	Jul 23, 2024	Automatic Speech RecognitionDistant Speech Recognition	—Unverified
The coding and annotation of multimodal dialogue acts	May 1, 2012	Speech Recognition	—Unverified
The Cohort and Speechify Libraries for Rapid Construction of Speech Enabled Applications for Android	Sep 1, 2015	Action DetectionSpeech Recognition	—Unverified
The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge	Feb 4, 2022	Action DetectionActivity Detection	—Unverified
The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design	Nov 13, 2019	BIG-bench Machine LearningNatural Language Understanding	—Unverified
The design and implementation of Language Learning Chatbot with XAI using Ontology and Transfer Learning	Sep 29, 2020	ChatbotExplainable artificial intelligence	—Unverified
The Dialog State Tracking Challenge	Aug 1, 2013	dialog state trackingSpeech Recognition	—Unverified
The DIRHA Portuguese Corpus: A Comparison of Home Automation Command Detection and Recognition in Simulated and Real Data.	May 1, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The DIRHA simulated corpus	May 1, 2014	Dialogue ManagementDistant Speech Recognition	—Unverified
The DISCO ASR-based CALL system: practicing L2 oral skills and beyond	May 1, 2012	speech-recognitionSpeech Recognition	—Unverified
The EASR Corpora of European Portuguese, French, Hungarian and Polish Elderly Speech	May 1, 2014	Speech Recognition	—Unverified
The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR	Mar 31, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition	Jan 13, 2022	Generative Adversarial NetworkPhoneme Recognition	—Unverified
The Effect of Cognitive Load on a Statistical Dialogue System	Jul 1, 2012	Speech Recognition	—Unverified
The Effect of Dependency Representation Scheme on Syntactic Language Modelling	Nov 1, 2014	Constituency ParsingDependency Parsing	—Unverified
The Effect of Sensor Errors in Situated Human-Computer Dialogue	Aug 1, 2014	Speech Recognition	—Unverified
The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages	Feb 21, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The ETAPE corpus for the evaluation of speech-based TV content processing in the French language	May 1, 2012	Speech Recognition	—Unverified
The ETAPE speech processing evaluation	May 1, 2014	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The evaluation of a code-switched Sepedi-English automatic speech recognition system	Mar 11, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language	Sep 12, 2024	Automatic Speech Recognitionspeech-recognition	—Unverified
SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems	Jul 13, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines	Mar 28, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models	Jul 1, 2015	Information RetrievalLanguage Modelling	—Unverified
The Future of Spoken Dialogue Systems is in their Past: Long-Term Adaptive, Conversational Assistants	Jun 1, 2012	Language ModellingSpeech Recognition	—Unverified
The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning	Sep 29, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The GUA-Speech System Description for CNVSRC Challenge 2023	Dec 12, 2023	DecoderLanguage Modeling	—Unverified
The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis	Aug 1, 2013	Dependency ParsingFeature Engineering	—Unverified
The Herme Database of Spontaneous Multimodal Human-Robot Dialogues	May 1, 2012	Gesture RecognitionSpeech Recognition	—Unverified
The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches	Mar 3, 2018	Deep LearningDeep Reinforcement Learning	—Unverified
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation	Aug 9, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation	May 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The IBM 2015 English Conversational Telephone Speech Recognition System	May 21, 2015	Language ModelingLanguage Modelling	—Unverified
The IBM 2016 English Conversational Telephone Speech Recognition System	Apr 27, 2016	Language ModelingLanguage Modelling	—Unverified
The IBM 2016 Speaker Recognition System	Feb 23, 2016	2kAutomatic Speech Recognition	—Unverified
The IBM Speaker Recognition System: Recent Advances and Error Analysis	May 5, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The ILMT-s2s Corpus â€• A Multimodal Interlingual Map Task Corpus	May 1, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR	Mar 30, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Importance of Recommender and Feedback Features in a Pronunciation Learning Aid	Jul 1, 2018	Information RetrievalRecommendation Systems	—Unverified
The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software	Dec 1, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The InproTK 2012 release	Jun 1, 2012	Dialogue ManagementSpeech Recognition	—Unverified
The IOIT English ASR system for IWSLT 2016	Dec 1, 2016	Language ModelingLanguage Modelling	—Unverified
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation	May 1, 2012	Machine Translationspeech-recognition	—Unverified
The IWSLT 2016 Evaluation Campaign	Dec 1, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The IWSLT 2019 KIT Speech Translation System	Nov 1, 2019	speech-recognitionSpeech Recognition	—Unverified
The IWSLT 2021 BUT Speech Translation Systems	Jul 13, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified

Show:10 25 50

← PrevPage 91 of 129Next →

All datasets LibriSpeech test-clean LibriSpeech test-other Switchboard + Hub500 TIMIT AISHELL-1 WSJ eval92 Common Voice German swb_hub_500 WER fullSWBCH TUDA Common Voice French Common Voice Spanish MediaSpeech

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AmNet	Word Error Rate (WER)	8.6	—	Unverified
2	HMM-(SAT)GMM	Word Error Rate (WER)	8	—	Unverified
3	Local Prior Matching (Large Model)	Word Error Rate (WER)	7.19	—	Unverified
4	Snips	Word Error Rate (WER)	6.4	—	Unverified
5	Li-GRU	Word Error Rate (WER)	6.2	—	Unverified
6	HMM-DNN + pNorm*	Word Error Rate (WER)	5.5	—	Unverified
7	CTC + policy learning	Word Error Rate (WER)	5.42	—	Unverified
8	Deep Speech 2	Word Error Rate (WER)	5.33	—	Unverified
9	HMM-TDNN + iVectors	Word Error Rate (WER)	4.8	—	Unverified
10	Gated ConvNets	Word Error Rate (WER)	4.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Local Prior Matching (Large Model)	Word Error Rate (WER)	20.84	—	Unverified
2	Snips	Word Error Rate (WER)	16.5	—	Unverified
3	Local Prior Matching (Large Model, ConvLM LM)	Word Error Rate (WER)	15.28	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	13.25	—	Unverified
5	TDNN + pNorm + speed up/down speech	Word Error Rate (WER)	12.5	—	Unverified
6	CTC-CRF 4gram-LM	Word Error Rate (WER)	10.65	—	Unverified
7	Convolutional Speech Recognition	Word Error Rate (WER)	10.47	—	Unverified
8	MT4SSL	Word Error Rate (WER)	9.6	—	Unverified
9	Jasper DR 10x5	Word Error Rate (WER)	8.79	—	Unverified
10	Espresso	Word Error Rate (WER)	8.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Deep Speech	Percentage error	20	—	Unverified
2	DNN-HMM	Percentage error	18.5	—	Unverified
3	CD-DNN	Percentage error	16.1	—	Unverified
4	DNN	Percentage error	16	—	Unverified
5	DNN + Dropout	Percentage error	15	—	Unverified
6	DNN BMMI	Percentage error	12.9	—	Unverified
7	DNN MPE	Percentage error	12.9	—	Unverified
8	DNN MMI	Percentage error	12.9	—	Unverified
9	HMM-TDNN + pNorm + speed up/down speech	Percentage error	12.9	—	Unverified
10	HMM-DNN +sMBR	Percentage error	12.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSNN	Percentage error	33.2	—	Unverified
2	LAS multitask with indicators sampling	Percentage error	20.4	—	Unverified
3	Soft Monotonic Attention (ours, offline)	Percentage error	20.1	—	Unverified
4	QCNN-10L-256FM	Percentage error	19.64	—	Unverified
5	Bi-LSTM + skip connections w/ CTC	Percentage error	17.7	—	Unverified
6	Bi-RNN + Attention	Percentage error	17.6	—	Unverified
7	RNN-CRF on 24(x3) MFSC	Percentage error	17.3	—	Unverified
8	CNN in time and frequency + dropout, 17.6% w/o dropout	Percentage error	16.7	—	Unverified
9	Light Gated Recurrent Units	Percentage error	16.7	—	Unverified
10	GRU	Percentage error	16.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Att	Word Error Rate (WER)	18.7	—	Unverified
2	CTC/Att	Word Error Rate (WER)	6.7	—	Unverified
3	BRA-E	Word Error Rate (WER)	6.63	—	Unverified
4	CTC-CRF 4gram-LM	Word Error Rate (WER)	6.34	—	Unverified
5	BAT	Word Error Rate (WER)	4.97	—	Unverified
6	Paraformer	Word Error Rate (WER)	4.95	—	Unverified
7	U2	Word Error Rate (WER)	4.72	—	Unverified
8	UMA	Word Error Rate (WER)	4.7	—	Unverified
9	Lightweight Transducer	Word Error Rate (WER)	4.31	—	Unverified
10	CIF-HKD With LM	Word Error Rate (WER)	4.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Jasper 10x3	Word Error Rate (WER)	6.9	—	Unverified
2	CNN over RAW speech (wav)	Word Error Rate (WER)	5.6	—	Unverified
3	CTC-CRF 4gram-LM	Word Error Rate (WER)	3.79	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	3.6	—	Unverified
5	test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*	Word Error Rate (WER)	3.6	—	Unverified
6	Convolutional Speech Recognition	Word Error Rate (WER)	3.5	—	Unverified
7	TC-DNN-BLSTM-DNN	Word Error Rate (WER)	3.5	—	Unverified
8	Espresso	Word Error Rate (WER)	3.4	—	Unverified
9	CTC-CRF VGG-BLSTM	Word Error Rate (WER)	3.2	—	Unverified
10	Transformer with Relaxed Attention	Word Error Rate (WER)	3.19	—	Unverified