SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 20262050 of 3012 papers

TitleStatusHype
Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers0
Looking Enhances Listening: Recovering Missing Speech Using Images0
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use0
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR0
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent0
Loss Prediction: End-to-End Active Learning Approach For Speech Recognition0
Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models0
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech0
Low Latency ASR for Simultaneous Speech Translation0
Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network0
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning0
Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training0
Low Resource German ASR with Untranscribed Data Spoken by Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL System0
Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages0
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition0
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network0
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR0
LV-CTC: Non-autoregressive ASR with CTC and latent variable models0
Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics0
Machine Speech Chain with One-shot Speaker Adaptation0
MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition0
Magic dust for cross-lingual adaptation of monolingual wav2vec-2.00
Mai Ho'omāuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian0
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation0
Malayalam Speech Corpus: Design and Development for Dravidian Language0
Show:102550
← PrevPage 82 of 121Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified