SOTAVerified

Automatic Speech Recognition

Papers

Showing 9511000 of 3174 papers

TitleStatusHype
Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models0
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition0
Towards Probing Contact Center Large Language Models0
The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge0
Exploring data augmentation in bias mitigation against non-native-accented speech0
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification0
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition0
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models0
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech RecognitionCode0
Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?0
SpokesBiz -- an Open Corpus of Conversational Polish0
Efficiency-oriented approaches for self-supervised speech representation learning0
Seq2seq for Automatic Paraphasia Detection in Aphasic SpeechCode0
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices0
OAVA: the open audio-visual archives aggregator0
Generative Context-aware Fine-tuning of Self-supervised Speech Models0
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data0
Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition0
Audio-visual fine-tuning of audio-only ASR models0
FastInject: Injecting Unpaired Text Data into CTC-based ASR training0
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition0
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models0
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification0
Creating Spoken Dialog Systems in Ultra-Low Resourced Settings0
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective LearningCode0
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models0
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition0
Bigger is not Always Better: The Effect of Context Size on Speech Pre-TrainingCode0
End-to-End Speech-to-Text Translation: A Survey0
End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data0
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsCode0
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR0
Soft Random Sampling: A Theoretical and Empirical Analysis0
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the WildCode0
How does end-to-end speech recognition training impact speech enhancement artifacts?0
App for Resume-Based Job Matching with Speech Interviews and Grammar Analysis: A Review0
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition0
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding0
Multi-channel Conversational Speaker Separation via Neural Diarization0
Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer0
Retrieve and Copy: Scaling ASR Personalization to Large Catalogs0
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition0
1SPU: 1-step Speech Processing Unit0
A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognitionCode0
Fine-tuning convergence model in Bengali speech recognition0
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech RecognitionCode0
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning0
Server-side Rescoring of Spoken Entity-centric Knowledge Queries for Virtual Assistants0
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios0
Combining Language Models For Specialized Domains: A Colorful Approach0
Show:102550
← PrevPage 20 of 64Next →

No leaderboard results yet.