SOTAVerified

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Papers

Showing 20012050 of 3012 papers

TitleStatusHype
Lightly Supervised Quality Estimation0
Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform0
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions0
Lightweight Prompt Biasing for Contextualized End-to-End ASR Systems0
Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR0
Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition0
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect0
LinTO Platform: A Smart Open Voice Assistant for Business Environments0
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models0
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models0
Listening Comprehension over Argumentative Content0
Listening while Speaking: Speech Chain by Deep Learning0
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End0
LiSTra, Automatic Speech Translation: English to Lingala case study0
LiSTra Automatic Speech Translation: English to Lingala Case Study0
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data0
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale0
LLM-based phoneme-to-grapheme for phoneme-based speech recognition0
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization0
Local Feature or Mel Frequency Cepstral Coefficients - Which One is Better for MLN-Based Bangla Speech Recognition?0
Locality enhanced dynamic biasing and sampling strategies for contextual ASR0
Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing0
Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features0
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer0
Incorporating VAD into ASR System by Multi-task Learning0
Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers0
Looking Enhances Listening: Recovering Missing Speech Using Images0
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use0
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR0
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent0
Loss Prediction: End-to-End Active Learning Approach For Speech Recognition0
Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models0
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech0
Low Latency ASR for Simultaneous Speech Translation0
Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network0
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning0
Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training0
Low Resource German ASR with Untranscribed Data Spoken by Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL System0
Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages0
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition0
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network0
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR0
LV-CTC: Non-autoregressive ASR with CTC and latent variable models0
Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics0
Machine Speech Chain with One-shot Speaker Adaptation0
MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition0
Magic dust for cross-lingual adaptation of monolingual wav2vec-2.00
Mai Ho'omāuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian0
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation0
Malayalam Speech Corpus: Design and Development for Dravidian Language0
Show:102550
← PrevPage 41 of 61Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TM-CTCTest WER10.1Unverified
2TM-seq2seqTest WER9.7Unverified
3CTC/attentionTest WER8.2Unverified
4LF-MMI TDNNTest WER6.7Unverified
5Whisper-LLaMATest WER6.6Unverified
6End2end ConformerTest WER3.9Unverified
7End2end ConformerTest WER3.7Unverified
8MoCo + wav2vec (w/o extLM)Test WER2.7Unverified
9CTC/AttentionTest WER1.5Unverified
10WhisperTest WER1.3Unverified
#ModelMetricClaimedVerifiedStatus
1SpatialNetCER14.5Unverified
2CleanMel-L-maskCER14.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerTest WER15.32Unverified
2Whisper-largev3-finetunedTest WER10.82Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)1.89Unverified
#ModelMetricClaimedVerifiedStatus
1DistillAVWER1.4Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)4.28Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)8.04Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer TransducerWER (%)3.36Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer Transducer (German)WER (%)8.98Unverified