Speech Recognition

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4801–4850 of 6433 papers

Title	Date	Tasks	Status	Hype
Introspection for convolutional automatic speech recognition	Nov 1, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
`Indicatements' that character language models learn English morpho-syntactic units and regularities	Nov 1, 2018	Feature EngineeringLanguage Modeling	—Unverified	0
Sisyphus, a Workflow Manager Designed for Machine Translation and Automatic Speech Recognition	Nov 1, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
PizzaPal: Conversational Pizza Ordering using a High-Density Conversational AI Platform	Nov 1, 2018	dialog state trackingSpeech Recognition	—Unverified	0
Visualizing Group Dynamics based on Multiparty Meeting Understanding	Nov 1, 2018	Decision MakingOpinion Mining	—Unverified	0
Unauthorized AI cannot Recognize Me: Reversible Adversarial Example	Nov 1, 2018	Adversarial AttackBIG-bench Machine Learning	—Unverified	0
On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition	Nov 1, 2018	Data AugmentationLanguage Identification	CodeCode Available	0
How2: A Large-scale Dataset for Multimodal Language Understanding	Nov 1, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Tropical Modeling of Weighted Transducer Algorithms on Graphs	Nov 1, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition	Oct 31, 2018	speech-recognitionSpeech Recognition	—Unverified	0
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator	Oct 31, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Attention-based sequence-to-sequence model for speech recognition: development of state-of-the-art system on LibriSpeech and its application to non-native English	Oct 31, 2018	speech-recognitionSpeech Recognition	—Unverified	0
Towards End-to-End Code-Switching Speech Recognition	Oct 31, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Towards End-to-end Automatic Code-Switching Speech Recognition	Oct 30, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech	Oct 30, 2018	Speech RecognitionVoice Conversion	CodeCode Available	0
Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data	Oct 30, 2018	speech-recognitionSpeech Recognition	—Unverified	0
Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation	Oct 30, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	0
Cascaded CNN-resBiLSTM-CTC: An End-to-End Acoustic Model For Speech Recognition	Oct 29, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Contextual Speech Recognition with Difficult Negative Training Examples	Oct 29, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training	Oct 28, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	0
Hypergraph based semi-supervised learning algorithms applied to speech recognition problem: a novel approach	Oct 28, 2018	Sensitivityspeech-recognition	—Unverified	0
Robust Audio Adversarial Example for a Physical Attack	Oct 28, 2018	speech-recognitionSpeech Recognition	CodeCode Available	0
Neuron Activation Profiles for Interpreting Convolutional Speech Recognition Models	Oct 26, 2018	Clusteringspeech-recognition	—Unverified	0
Scaling Speech Enhancement in Unseen Environments with Noise Embeddings	Oct 26, 2018	Speech Enhancementspeech-recognition	—Unverified	0
Speaker Selective Beamformer with Keyword Mask Estimation	Oct 25, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Tackling Sequence to Sequence Mapping Problems with Neural Networks	Oct 25, 2018	Domain AdaptationFeature Engineering	—Unverified	0
The MeMAD Submission to the IWSLT 2018 Speech Translation Task	Oct 24, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
A Deep Generative Acoustic Model for Compositional Automatic Speech Recognition	Oct 23, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Language Modeling at Scale	Oct 23, 2018	GPULanguage Modeling	—Unverified	0
Semi-supervised acoustic model training for speech with code-switching	Oct 23, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Learned in Speech Recognition: Contextual Acoustic Word Embeddings	Oct 22, 2018	Sentencespeech-recognition	—Unverified	0
Targeted Adversarial Examples for Black Box Audio Systems	Oct 22, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
ROBUST SPEECH COMMAND RECOGNITION USING LABEL-DRIVEN TIME-FREQUENCY MASKING	Oct 22, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
How transferable are features in convolutional neural network acoustic models across languages?	Oct 22, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
A comprehensive analysis on attention models	Oct 22, 2018	speech-recognitionSpeech Recognition	—Unverified	0
Cycle-Consistent GAN Front-End to Improve ASR Robustness to Perturbed Speech	Oct 22, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation	Oct 22, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Improved Speech Enhancement with the Wave-U-Net	Oct 22, 2018	Audio Source SeparationSpeech Enhancement	—Unverified	0
Transferable and Configurable Audio Adversarial Attack from Low-Level Features	Oct 22, 2018	Adversarial AttackAutomatic Speech Recognition	—Unverified	0
On the Inductive Bias of Word-Character-Level Multi-Task Learning for Speech Recognition	Oct 22, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Robust Domain Adaptation By Augmented Cyclic Adversarial Learning	Oct 22, 2018	Domain Adaptationspeech-recognition	—Unverified	0
Proactive Security: Embedded AI Solution for Violent and Abusive Speech Recognition	Oct 22, 2018	Data Augmentationspeech-recognition	—Unverified	0
Interpretable Convolutional Filters with SincNet	Oct 21, 2018	Inductive Biasspeech-recognition	—Unverified	0
Hierarchical Text Generation using an Outline	Oct 20, 2018	Dialogue Generationspeech-recognition	CodeCode Available	0
EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge	Oct 18, 2018	speech-recognitionSpeech Recognition	—Unverified	0
Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation	Oct 17, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild	Oct 16, 2018	LipreadingLip Reading	CodeCode Available	0
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks	Oct 16, 2018	Evolutionary AlgorithmsLanguage Modeling	CodeCode Available	0
Speech Recognition with Quaternion Neural Networks	Oct 15, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
3D Feature Pyramid Attention Module for Robust Visual Speech Recognition	Oct 15, 2018	LipreadingSentence	—Unverified	0

Show:10 25 50

← PrevPage 97 of 129Next →

All datasets LibriSpeech test-clean LibriSpeech test-other Switchboard + Hub500 TIMIT AISHELL-1 WSJ eval92 Common Voice German swb_hub_500 WER fullSWBCH TUDA Common Voice French Common Voice Spanish MediaSpeech

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AmNet	Word Error Rate (WER)	8.6	—	Unverified
2	HMM-(SAT)GMM	Word Error Rate (WER)	8	—	Unverified
3	Local Prior Matching (Large Model)	Word Error Rate (WER)	7.19	—	Unverified
4	Snips	Word Error Rate (WER)	6.4	—	Unverified
5	Li-GRU	Word Error Rate (WER)	6.2	—	Unverified
6	HMM-DNN + pNorm*	Word Error Rate (WER)	5.5	—	Unverified
7	CTC + policy learning	Word Error Rate (WER)	5.42	—	Unverified
8	Deep Speech 2	Word Error Rate (WER)	5.33	—	Unverified
9	HMM-TDNN + iVectors	Word Error Rate (WER)	4.8	—	Unverified
10	Gated ConvNets	Word Error Rate (WER)	4.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Local Prior Matching (Large Model)	Word Error Rate (WER)	20.84	—	Unverified
2	Snips	Word Error Rate (WER)	16.5	—	Unverified
3	Local Prior Matching (Large Model, ConvLM LM)	Word Error Rate (WER)	15.28	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	13.25	—	Unverified
5	TDNN + pNorm + speed up/down speech	Word Error Rate (WER)	12.5	—	Unverified
6	CTC-CRF 4gram-LM	Word Error Rate (WER)	10.65	—	Unverified
7	Convolutional Speech Recognition	Word Error Rate (WER)	10.47	—	Unverified
8	MT4SSL	Word Error Rate (WER)	9.6	—	Unverified
9	Jasper DR 10x5	Word Error Rate (WER)	8.79	—	Unverified
10	Espresso	Word Error Rate (WER)	8.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Deep Speech	Percentage error	20	—	Unverified
2	DNN-HMM	Percentage error	18.5	—	Unverified
3	CD-DNN	Percentage error	16.1	—	Unverified
4	DNN	Percentage error	16	—	Unverified
5	DNN + Dropout	Percentage error	15	—	Unverified
6	DNN BMMI	Percentage error	12.9	—	Unverified
7	DNN MPE	Percentage error	12.9	—	Unverified
8	DNN MMI	Percentage error	12.9	—	Unverified
9	HMM-TDNN + pNorm + speed up/down speech	Percentage error	12.9	—	Unverified
10	HMM-DNN +sMBR	Percentage error	12.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSNN	Percentage error	33.2	—	Unverified
2	LAS multitask with indicators sampling	Percentage error	20.4	—	Unverified
3	Soft Monotonic Attention (ours, offline)	Percentage error	20.1	—	Unverified
4	QCNN-10L-256FM	Percentage error	19.64	—	Unverified
5	Bi-LSTM + skip connections w/ CTC	Percentage error	17.7	—	Unverified
6	Bi-RNN + Attention	Percentage error	17.6	—	Unverified
7	RNN-CRF on 24(x3) MFSC	Percentage error	17.3	—	Unverified
8	CNN in time and frequency + dropout, 17.6% w/o dropout	Percentage error	16.7	—	Unverified
9	Light Gated Recurrent Units	Percentage error	16.7	—	Unverified
10	GRU	Percentage error	16.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Att	Word Error Rate (WER)	18.7	—	Unverified
2	CTC/Att	Word Error Rate (WER)	6.7	—	Unverified
3	BRA-E	Word Error Rate (WER)	6.63	—	Unverified
4	CTC-CRF 4gram-LM	Word Error Rate (WER)	6.34	—	Unverified
5	BAT	Word Error Rate (WER)	4.97	—	Unverified
6	Paraformer	Word Error Rate (WER)	4.95	—	Unverified
7	U2	Word Error Rate (WER)	4.72	—	Unverified
8	UMA	Word Error Rate (WER)	4.7	—	Unverified
9	Lightweight Transducer	Word Error Rate (WER)	4.31	—	Unverified
10	CIF-HKD With LM	Word Error Rate (WER)	4.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Jasper 10x3	Word Error Rate (WER)	6.9	—	Unverified
2	CNN over RAW speech (wav)	Word Error Rate (WER)	5.6	—	Unverified
3	CTC-CRF 4gram-LM	Word Error Rate (WER)	3.79	—	Unverified
4	Deep Speech 2	Word Error Rate (WER)	3.6	—	Unverified
5	test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*	Word Error Rate (WER)	3.6	—	Unverified
6	Convolutional Speech Recognition	Word Error Rate (WER)	3.5	—	Unverified
7	TC-DNN-BLSTM-DNN	Word Error Rate (WER)	3.5	—	Unverified
8	Espresso	Word Error Rate (WER)	3.4	—	Unverified
9	CTC-CRF VGG-BLSTM	Word Error Rate (WER)	3.2	—	Unverified
10	Transformer with Relaxed Attention	Word Error Rate (WER)	3.19	—	Unverified