Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6001–6050 of 6433 papers

Title	Date	Tasks	Status
Situated Incremental Natural Language Understanding using a Multimodal, Linguistically-driven Update Model	Aug 1, 2014	Dialogue ManagementNatural Language Understanding	—Unverified
Class-Based Language Modeling for Translating into Morphologically Rich Languages	Aug 1, 2014	Domain AdaptationLanguage Modeling	—Unverified
Confusion Network for Arabic Name Disambiguation and Transliteration in Statistical Machine Translation	Aug 1, 2014	Machine TranslationSpeech Recognition	—Unverified
Quality Estimation for Automatic Speech Recognition	Aug 1, 2014	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Recurrent Neural Network-based Tuple Sequence Model for Machine Translation	Aug 1, 2014	Language ModellingMachine Translation	—Unverified
A PAC-Bayesian Approach to Minimum Perplexity Language Modeling	Aug 1, 2014	Language ModelingLanguage Modelling	—Unverified
Learning from 26 Languages: Program Management and Science in the Babel Program	Aug 1, 2014	ManagementSpeech Recognition	—Unverified
The Effect of Sensor Errors in Situated Human-Computer Dialogue	Aug 1, 2014	Speech Recognition	—Unverified
Automatically building a Tunisian Lexicon for Deverbal Nouns	Aug 1, 2014	Speech Recognition	—Unverified
Key Event Detection in Video using ASR and Visual Data	Aug 1, 2014	Event DetectionFace Alignment	—Unverified
Employing Phonetic Speech Recognition for Language and Dialect Specific Search	Aug 1, 2014	Information RetrievalKeyword Spotting	—Unverified
Developing further speech recognition resources for Welsh	Aug 1, 2014	speech-recognitionSpeech Recognition	—Unverified
Trainable and Dynamic Computing: Error Backpropagation through Physical Media	Jul 24, 2014	speech-recognitionSpeech Recognition	—Unverified
Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition	Jul 4, 2014	Audio-Visual Speech RecognitionAutomatic Speech Recognition	—Unverified
Building DNN Acoustic Models for Large Vocabulary Speech Recognition	Jun 30, 2014	speech-recognitionSpeech Recognition	CodeCode Available
On the Use of Different Feature Extraction Methods for Linear and Non Linear kernels	Jun 27, 2014	Robust Speech RecognitionSpeaker Identification	—Unverified
Dropout: A Simple Way to Prevent Neural Networks from Overfitting	Jun 1, 2014	Document Classificationspeech-recognition	—Unverified
Towards End-To-End Speech Recognition with Recurrent Neural Networks	Jun 1, 2014	Language ModelingLanguage Modelling	—Unverified
Markovian Discriminative Modeling for Dialog State Tracking	Jun 1, 2014	dialog state trackingSpeech Recognition	—Unverified
Optimizing Generative Dialog State Tracker via Cascading Gradient Descent	Jun 1, 2014	Speech RecognitionSpoken Language Understanding	—Unverified
Alex: Bootstrapping a Spoken Dialogue System for a New Domain by Real Users	Jun 1, 2014	Dialogue ManagementLanguage Modelling	—Unverified
SAWDUST: a Semi-Automated Wizard Dialogue Utterance Selection Tool for domain-independent large-domain dialogue	Jun 1, 2014	Speech Recognition	—Unverified
Evaluating a Spoken Dialogue System that Detects and Adapts to User Affective States	Jun 1, 2014	Speech Recognition	—Unverified
Aided diagnosis of dementia type through computer-based analysis of spontaneous speech	Jun 1, 2014	Speech Recognition	—Unverified
InproTKs: A Toolkit for Incremental Situated Processing	Jun 1, 2014	Gesture RecognitionSpeech Recognition	—Unverified
Word-Based Dialog State Tracking with Recurrent Neural Networks	Jun 1, 2014	dialog state trackingFeature Engineering	—Unverified
MVA: The Multimodal Virtual Assistant	Jun 1, 2014	Speech RecognitionSpeech Synthesis	—Unverified
Revisiting Word Neighborhoods for Speech Recognition	Jun 1, 2014	speech-recognitionSpeech Recognition	—Unverified
Extrinsic Evaluation of Dialog State Tracking and Predictive Metrics for Dialog Policy Optimization	Jun 1, 2014	dialog state trackingSpeech Recognition	—Unverified
Dive deeper: Deep Semantics for Sentiment Analysis	Jun 1, 2014	Machine TranslationNamed Entity Recognition (NER)	—Unverified
The SJTU System for Dialog State Tracking Challenge 2	Jun 1, 2014	dialog state trackingDialogue State Tracking	—Unverified
Comparative Error Analysis of Dialog State Tracking	Jun 1, 2014	dialog state trackingSpeech Recognition	—Unverified
Free on-line speech recogniser based on Kaldi ASR toolkit producing word posterior lattices	Jun 1, 2014	Acoustic ModellingLanguage Modelling	—Unverified
Unsupervised Adaptation for Statistical Machine Translation	Jun 1, 2014	Domain AdaptationLanguage Modelling	—Unverified
The PARLANCE mobile application for interactive search in English and Mandarin	Jun 1, 2014	Speech RecognitionSpoken Language Understanding	—Unverified
Detecting Health Related Discussions in Everyday Telephone Conversations for Studying Medical Events in the Lives of Older Adults	Jun 1, 2014	Speech Recognition	—Unverified
Sequential Labeling for Tracking Dynamic Dialog States	Jun 1, 2014	Slot FillingSpeech Recognition	—Unverified
Using Ellipsis Detection and Word Similarity for Transformation of Spoken Language into Grammatically Valid Sentences	Jun 1, 2014	Semantic Textual SimilaritySpeech Recognition	—Unverified
Bayesian Reordering Model with Feature Selection	Jun 1, 2014	feature selectionMachine Translation	—Unverified
A Demonstration of Dialogue Processing in SimSensei Kiosk	Jun 1, 2014	Dialogue ManagementSpeech Recognition	—Unverified
Web-style ranking and SLU combination for dialog state tracking	Jun 1, 2014	dialog state trackingSpeech Recognition	—Unverified
Dialogue Strategy Learning in Healthcare: A Systematic Approach for Learning Dialogue Models from Data	Jun 1, 2014	Decision MakingSpeech Recognition	—Unverified
LingSync \& the Online Linguistic Database: New Models for the Collection and Management of Data for Language Communities, Linguists and Language Learners	Jun 1, 2014	ManagementSpeech Recognition	—Unverified
Automatic evaluation of spoken summaries: the case of language assessment	Jun 1, 2014	Speech Recognition	—Unverified
Speech recognition in Alzheimer's disease with personal assistive robots	Jun 1, 2014	Object Recognitionspeech-recognition	—Unverified
Preliminary Test of a Real-Time, Interactive Silent Speech Interface Based on Electromagnetic Articulograph	Jun 1, 2014	Speech RecognitionVisual Speech Recognition	—Unverified
Individuality-preserving Voice Conversion for Articulation Disorders Using Dictionary Selective Non-negative Matrix Factorization	Jun 1, 2014	Speech RecognitionVoice Conversion	—Unverified
Syllable and language model based features for detecting non-scorable tests in spoken language proficiency assessment applications	Jun 1, 2014	Language ModelingLanguage Modelling	—Unverified
Automated scoring of speaking items in an assessment for teachers of English as a Foreign Language	Jun 1, 2014	Speech Recognition	—Unverified
Short-Term Projects, Long-Term Benefits: Four Student NLP Projects for Low-Resource Languages	Jun 1, 2014	Language IdentificationNamed Entity Recognition (NER)	—Unverified

Show:10 25 50

← PrevPage 121 of 129Next →

All datasets LibriSpeech test-clean LibriSpeech test-other Switchboard + Hub500 TIMIT AISHELL-1 WSJ eval92 Common Voice German swb_hub_500 WER fullSWBCH TUDA Common Voice French Common Voice Spanish MediaSpeech

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AmNet	Word Error Rate (WER)	8.6	—	Unverified
2	HMM-(SAT)GMM	Word Error Rate (WER)	8	—	Unverified
3	Local Prior Matching (Large Model)	Word Error Rate (WER)	7.19	—	Unverified
4	Snips	Word Error Rate (WER)	6.4	—	Unverified
5	Li-GRU	Word Error Rate (WER)	6.2	—	Unverified
6	HMM-DNN + pNorm*	Word Error Rate (WER)	5.5	—	Unverified
7	CTC + policy learning	Word Error Rate (WER)	5.42	—	Unverified
8	Deep Speech 2	Word Error Rate (WER)	5.33	—	Unverified
9	HMM-TDNN + iVectors	Word Error Rate (WER)	4.8	—	Unverified
10	Gated ConvNets	Word Error Rate (WER)	4.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Local Prior Matching (Large Model)	Word Error Rate (WER)	20.84	—	Unverified
2	Snips	Word Error Rate (WER)	16.5	—	Unverified
3	Local Prior Matching (Large Model, ConvLM LM)	Word Error Rate (WER)	15.28	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	13.25	—	Unverified
5	TDNN + pNorm + speed up/down speech	Word Error Rate (WER)	12.5	—	Unverified
6	CTC-CRF 4gram-LM	Word Error Rate (WER)	10.65	—	Unverified
7	Convolutional Speech Recognition	Word Error Rate (WER)	10.47	—	Unverified
8	MT4SSL	Word Error Rate (WER)	9.6	—	Unverified
9	Jasper DR 10x5	Word Error Rate (WER)	8.79	—	Unverified
10	Espresso	Word Error Rate (WER)	8.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Deep Speech	Percentage error	20	—	Unverified
2	DNN-HMM	Percentage error	18.5	—	Unverified
3	CD-DNN	Percentage error	16.1	—	Unverified
4	DNN	Percentage error	16	—	Unverified
5	DNN + Dropout	Percentage error	15	—	Unverified
6	DNN BMMI	Percentage error	12.9	—	Unverified
7	DNN MPE	Percentage error	12.9	—	Unverified
8	DNN MMI	Percentage error	12.9	—	Unverified
9	HMM-TDNN + pNorm + speed up/down speech	Percentage error	12.9	—	Unverified
10	HMM-DNN +sMBR	Percentage error	12.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSNN	Percentage error	33.2	—	Unverified
2	LAS multitask with indicators sampling	Percentage error	20.4	—	Unverified
3	Soft Monotonic Attention (ours, offline)	Percentage error	20.1	—	Unverified
4	QCNN-10L-256FM	Percentage error	19.64	—	Unverified
5	Bi-LSTM + skip connections w/ CTC	Percentage error	17.7	—	Unverified
6	Bi-RNN + Attention	Percentage error	17.6	—	Unverified
7	RNN-CRF on 24(x3) MFSC	Percentage error	17.3	—	Unverified
8	CNN in time and frequency + dropout, 17.6% w/o dropout	Percentage error	16.7	—	Unverified
9	Light Gated Recurrent Units	Percentage error	16.7	—	Unverified
10	GRU	Percentage error	16.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Att	Word Error Rate (WER)	18.7	—	Unverified
2	CTC/Att	Word Error Rate (WER)	6.7	—	Unverified
3	BRA-E	Word Error Rate (WER)	6.63	—	Unverified
4	CTC-CRF 4gram-LM	Word Error Rate (WER)	6.34	—	Unverified
5	BAT	Word Error Rate (WER)	4.97	—	Unverified
6	Paraformer	Word Error Rate (WER)	4.95	—	Unverified
7	U2	Word Error Rate (WER)	4.72	—	Unverified
8	UMA	Word Error Rate (WER)	4.7	—	Unverified
9	Lightweight Transducer	Word Error Rate (WER)	4.31	—	Unverified
10	CIF-HKD With LM	Word Error Rate (WER)	4.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Jasper 10x3	Word Error Rate (WER)	6.9	—	Unverified
2	CNN over RAW speech (wav)	Word Error Rate (WER)	5.6	—	Unverified
3	CTC-CRF 4gram-LM	Word Error Rate (WER)	3.79	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	3.6	—	Unverified
5	test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*	Word Error Rate (WER)	3.6	—	Unverified
6	Convolutional Speech Recognition	Word Error Rate (WER)	3.5	—	Unverified
7	TC-DNN-BLSTM-DNN	Word Error Rate (WER)	3.5	—	Unverified
8	Espresso	Word Error Rate (WER)	3.4	—	Unverified
9	CTC-CRF VGG-BLSTM	Word Error Rate (WER)	3.2	—	Unverified
10	Transformer with Relaxed Attention	Word Error Rate (WER)	3.19	—	Unverified