SOTAVerified

Automatic Speech Recognition

Papers

Showing 726750 of 3174 papers

TitleStatusHype
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data0
FastInject: Injecting Unpaired Text Data into CTC-based ASR training0
Audio-visual fine-tuning of audio-only ASR models0
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models0
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition0
Extending Whisper with prompt tuning to target-speaker ASRCode1
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification0
Creating Spoken Dialog Systems in Ultra-Low Resourced Settings0
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective LearningCode0
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models0
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition0
Bigger is not Always Better: The Effect of Context Size on Speech Pre-TrainingCode0
End-to-End Speech-to-Text Translation: A Survey0
End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data0
D4AM: A General Denoising Framework for Downstream Acoustic ModelsCode1
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsCode0
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR0
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the WildCode0
Soft Random Sampling: A Theoretical and Empirical Analysis0
App for Resume-Based Job Matching with Speech Interviews and Grammar Analysis: A Review0
How does end-to-end speech recognition training impact speech enhancement artifacts?0
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding0
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition0
Multi-channel Conversational Speaker Separation via Neural Diarization0
Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer0
Show:102550
← PrevPage 30 of 127Next →

No leaderboard results yet.