Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6351–6400 of 6433 papers

Title	Date	Tasks	Status
The InproTK 2012 release	Jun 1, 2012	Dialogue ManagementSpeech Recognition	—Unverified
Up from Limited Dialog Systems!	Jun 1, 2012	Speech Recognition	—Unverified
HRItk: The Human-Robot Interaction ToolKit Rapid Development of Speech-Centric Interactive Systems in ROS	Jun 1, 2012	Gesture RecognitionObject Recognition	—Unverified
Intra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk	Jun 1, 2012	Meeting SummarizationSpeech Recognition	—Unverified
Unsupervised Vocabulary Adaptation for Morph-based Language Models	Jun 1, 2012	Language ModellingMORPH	—Unverified
Deep Neural Network Language Models	Jun 1, 2012	Language ModellingMachine Translation	—Unverified
A belief tracking challenge task for spoken dialog systems	Jun 1, 2012	Speech Recognition	—Unverified
Implicitly Intersecting Weighted Automata using Dual Decomposition	Jun 1, 2012	Combinatorial OptimizationLanguage Modelling	—Unverified
Real-time Incremental Speech-to-Speech Translation of Dialogs	Jun 1, 2012	Machine TranslationSpeech Recognition	—Unverified
Exploring Content Features for Automated Speech Scoring	Jun 1, 2012	Semantic Textual SimilaritySpeech Recognition	—Unverified
A Challenge Set for Advancing Language Modeling	Jun 1, 2012	Language ModelingLanguage Modelling	—Unverified
Using Ontology-based Approaches to Representing Speech Transcripts for Automated Speech Scoring	Jun 1, 2012	Speech Recognition	—Unverified
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation	May 1, 2012	Machine Translationspeech-recognition	—Unverified
BUCEADOR, a multi-language search engine for digital libraries	May 1, 2012	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Building a 70 billion word corpus of English from ClueWeb	May 1, 2012	Machine TranslationManagement	—Unverified
Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora	May 1, 2012	Semantic CompositionSpeech Recognition	—Unverified
The KIT Lecture Corpus for Speech Translation	May 1, 2012	Speech RecognitionTranslation	—Unverified
Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz	May 1, 2012	Speech Recognition	—Unverified
Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations	May 1, 2012	Speech Recognition	—Unverified
LDC Forced Aligner	May 1, 2012	SentenceSpeech Recognition	—Unverified
The DISCO ASR-based CALL system: practicing L2 oral skills and beyond	May 1, 2012	speech-recognitionSpeech Recognition	—Unverified
Prosomarker: a prosodic analysis tool based on optimal pitch stylization and automatic syllabi fication	May 1, 2012	Boundary DetectionSpeech Recognition	—Unverified
The META-SHARE Language Resources Sharing Infrastructure: Principles, Challenges, Solutions	May 1, 2012	Machine TranslationSpeech Recognition	—Unverified
Development of Text and Speech database for Hindi and Indian English specific to Mobile Communication environment	May 1, 2012	Language IdentificationSpeech Recognition	—Unverified
Resource Evaluation for Usable Speech Interfaces: Utilizing Human-Human Dialogue	May 1, 2012	Dialogue ManagementLanguage Modelling	—Unverified
Using an ASR database to design a pronunciation evaluation system in Basque	May 1, 2012	ClusteringSpeech Recognition	—Unverified
A Mandarin-English Code-Switching Corpus	May 1, 2012	Boundary DetectionLanguage Identification	—Unverified
A hierarchical approach with feature selection for emotion recognition from speech	May 1, 2012	ClassificationDimensionality Reduction	—Unverified
Building Text-To-Speech Voices in the Cloud	May 1, 2012	Speech RecognitionSpeech Synthesis	—Unverified
RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus	May 1, 2012	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Dysarthric Speech Database for Development of QoLT Software Technology	May 1, 2012	Speech Recognition	—Unverified
Suffix Trees as Language Models	May 1, 2012	Information RetrievalLanguage Modeling	—Unverified
Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis	May 1, 2012	Audio-Visual Speech RecognitionSpeech Recognition	—Unverified
Syntactic annotation of spontaneous speech: application to call-center conversation data	May 1, 2012	Dependency ParsingPOS	—Unverified
A Scalable Architecture For Web Deployment of Spoken Dialogue Systems	May 1, 2012	Dialogue ManagementManagement	—Unverified
Item Development and Scoring for Japanese Oral Proficiency Testing	May 1, 2012	Language ModelingLanguage Modelling	—Unverified
Evaluating Appropriateness Of System Responses In A Spoken CALL Game	May 1, 2012	Machine TranslationSpeech Recognition	—Unverified
Holaaa!! writin like u talk is kewl but kinda hard 4 NLP	May 1, 2012	Domain AdaptationLanguage Modelling	—Unverified
Constructive Interaction for Talking about Interesting Topics	May 1, 2012	ManagementSpeech Recognition	—Unverified
A Corpus for a Gesture-Controlled Mobile Spoken Dialogue System	May 1, 2012	Speech RecognitionSpoken Dialogue Systems	—Unverified
CoALT: A Software for Comparing Automatic Labelling Tools	May 1, 2012	Speech RecognitionSpeech Synthesis	—Unverified
TED-LIUM: an Automatic Speech Recognition dedicated corpus	May 1, 2012	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Multimodal Corpus of Multi-party Conversations in Second Language	May 1, 2012	Speech Recognition	—Unverified
Korean Children's Spoken English Corpus and an Analysis of its Pronunciation Variability	May 1, 2012	Speech Recognition	—Unverified
Cross-lingual studies of ASR errors: paradigms for perceptual evaluations	May 1, 2012	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Herme Database of Spontaneous Multimodal Human-Robot Dialogues	May 1, 2012	Gesture RecognitionSpeech Recognition	—Unverified
From keystrokes to annotated process data: Enriching the output of Inputlog with linguistic information	May 1, 2012	Speech Recognition	—Unverified
DECODA: a call-centre human-human spoken conversation corpus	May 1, 2012	Speech Recognition	—Unverified
Statistical Evaluation of Pronunciation Encoding	May 1, 2012	Speech RecognitionSpeech Synthesis	—Unverified
The Twins Corpus of Museum Visitor Questions	May 1, 2012	Dialogue ManagementNatural Language Understanding	—Unverified

Show:10 25 50

← PrevPage 128 of 129Next →

All datasets LibriSpeech test-clean LibriSpeech test-other Switchboard + Hub500 TIMIT AISHELL-1 WSJ eval92 Common Voice German swb_hub_500 WER fullSWBCH TUDA Common Voice French Common Voice Spanish MediaSpeech

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AmNet	Word Error Rate (WER)	8.6	—	Unverified
2	HMM-(SAT)GMM	Word Error Rate (WER)	8	—	Unverified
3	Local Prior Matching (Large Model)	Word Error Rate (WER)	7.19	—	Unverified
4	Snips	Word Error Rate (WER)	6.4	—	Unverified
5	Li-GRU	Word Error Rate (WER)	6.2	—	Unverified
6	HMM-DNN + pNorm*	Word Error Rate (WER)	5.5	—	Unverified
7	CTC + policy learning	Word Error Rate (WER)	5.42	—	Unverified
8	Deep Speech 2	Word Error Rate (WER)	5.33	—	Unverified
9	HMM-TDNN + iVectors	Word Error Rate (WER)	4.8	—	Unverified
10	Gated ConvNets	Word Error Rate (WER)	4.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Local Prior Matching (Large Model)	Word Error Rate (WER)	20.84	—	Unverified
2	Snips	Word Error Rate (WER)	16.5	—	Unverified
3	Local Prior Matching (Large Model, ConvLM LM)	Word Error Rate (WER)	15.28	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	13.25	—	Unverified
5	TDNN + pNorm + speed up/down speech	Word Error Rate (WER)	12.5	—	Unverified
6	CTC-CRF 4gram-LM	Word Error Rate (WER)	10.65	—	Unverified
7	Convolutional Speech Recognition	Word Error Rate (WER)	10.47	—	Unverified
8	MT4SSL	Word Error Rate (WER)	9.6	—	Unverified
9	Jasper DR 10x5	Word Error Rate (WER)	8.79	—	Unverified
10	Espresso	Word Error Rate (WER)	8.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Deep Speech	Percentage error	20	—	Unverified
2	DNN-HMM	Percentage error	18.5	—	Unverified
3	CD-DNN	Percentage error	16.1	—	Unverified
4	DNN	Percentage error	16	—	Unverified
5	DNN + Dropout	Percentage error	15	—	Unverified
6	DNN BMMI	Percentage error	12.9	—	Unverified
7	DNN MPE	Percentage error	12.9	—	Unverified
8	DNN MMI	Percentage error	12.9	—	Unverified
9	HMM-TDNN + pNorm + speed up/down speech	Percentage error	12.9	—	Unverified
10	HMM-DNN +sMBR	Percentage error	12.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSNN	Percentage error	33.2	—	Unverified
2	LAS multitask with indicators sampling	Percentage error	20.4	—	Unverified
3	Soft Monotonic Attention (ours, offline)	Percentage error	20.1	—	Unverified
4	QCNN-10L-256FM	Percentage error	19.64	—	Unverified
5	Bi-LSTM + skip connections w/ CTC	Percentage error	17.7	—	Unverified
6	Bi-RNN + Attention	Percentage error	17.6	—	Unverified
7	RNN-CRF on 24(x3) MFSC	Percentage error	17.3	—	Unverified
8	CNN in time and frequency + dropout, 17.6% w/o dropout	Percentage error	16.7	—	Unverified
9	Light Gated Recurrent Units	Percentage error	16.7	—	Unverified
10	GRU	Percentage error	16.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Att	Word Error Rate (WER)	18.7	—	Unverified
2	CTC/Att	Word Error Rate (WER)	6.7	—	Unverified
3	BRA-E	Word Error Rate (WER)	6.63	—	Unverified
4	CTC-CRF 4gram-LM	Word Error Rate (WER)	6.34	—	Unverified
5	BAT	Word Error Rate (WER)	4.97	—	Unverified
6	Paraformer	Word Error Rate (WER)	4.95	—	Unverified
7	U2	Word Error Rate (WER)	4.72	—	Unverified
8	UMA	Word Error Rate (WER)	4.7	—	Unverified
9	Lightweight Transducer	Word Error Rate (WER)	4.31	—	Unverified
10	CIF-HKD With LM	Word Error Rate (WER)	4.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Jasper 10x3	Word Error Rate (WER)	6.9	—	Unverified
2	CNN over RAW speech (wav)	Word Error Rate (WER)	5.6	—	Unverified
3	CTC-CRF 4gram-LM	Word Error Rate (WER)	3.79	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	3.6	—	Unverified
5	test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*	Word Error Rate (WER)	3.6	—	Unverified
6	Convolutional Speech Recognition	Word Error Rate (WER)	3.5	—	Unverified
7	TC-DNN-BLSTM-DNN	Word Error Rate (WER)	3.5	—	Unverified
8	Espresso	Word Error Rate (WER)	3.4	—	Unverified
9	CTC-CRF VGG-BLSTM	Word Error Rate (WER)	3.2	—	Unverified
10	Transformer with Relaxed Attention	Word Error Rate (WER)	3.19	—	Unverified