SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 126150 of 3012 papers

TitleStatusHype
Towards Improved Room Impulse Response Estimation for Speech RecognitionCode1
Multi-blank Transducers for Speech RecognitionCode1
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingCode1
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setupCode1
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningCode1
Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with Multi-task LearningCode1
There is more than one kind of robustness: Fooling Whisper with adversarial examplesCode1
ESB: A Benchmark For Multi-Domain End-to-End Speech RecognitionCode1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationCode1
Towards Relation Extraction From SpeechCode1
Can we use Common Voice to train a Multi-Speaker TTS system?Code1
A context-aware knowledge transferring strategy for CTC-based ASRCode1
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMTCode1
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representationsCode1
TVLT: Textless Vision-Language TransformerCode1
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LMCode1
Deep Sparse Conformer for Speech RecognitionCode1
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languagesCode1
ASR Error Correction with Constrained Decoding on Operation PredictionCode1
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognitionCode1
Improving Mandarin Speech Recogntion with Block-augmented TransformerCode1
Transfer Learning of wav2vec 2.0 for Automatic Lyric TranscriptionCode1
MM-ALT: A Multimodal Automatic Lyric Transcription SystemCode1
Distilling a Pretrained Language Model to a Multilingual ASR ModelCode1
A Systematic Comparison of Phonetic Aware Techniques for Speech EnhancementCode1
Show:102550
← PrevPage 6 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified