SOTAVerified

Automatic Speech Recognition

Papers

Showing 701750 of 3174 papers

TitleStatusHype
High-precision Voice Search Query Correction via Retrievable Speech-text Embedings0
Exploratory Evaluation of Speech Content Masking0
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech RepresentationCode0
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition0
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge0
DiarizationLM: Speaker Diarization Post-Processing with Large Language ModelsCode3
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASRCode0
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech RecognitionCode0
Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models0
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition0
Towards Probing Contact Center Large Language Models0
The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge0
Exploring data augmentation in bias mitigation against non-native-accented speech0
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification0
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition0
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models0
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech RecognitionCode0
Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?0
SpokesBiz -- an Open Corpus of Conversational Polish0
Efficiency-oriented approaches for self-supervised speech representation learning0
OAVA: the open audio-visual archives aggregator0
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices0
Seq2seq for Automatic Paraphasia Detection in Aphasic SpeechCode0
Generative Context-aware Fine-tuning of Self-supervised Speech Models0
Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition0
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data0
FastInject: Injecting Unpaired Text Data into CTC-based ASR training0
Audio-visual fine-tuning of audio-only ASR models0
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models0
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition0
Extending Whisper with prompt tuning to target-speaker ASRCode1
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification0
Creating Spoken Dialog Systems in Ultra-Low Resourced Settings0
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective LearningCode0
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models0
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition0
Bigger is not Always Better: The Effect of Context Size on Speech Pre-TrainingCode0
End-to-End Speech-to-Text Translation: A Survey0
End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data0
D4AM: A General Denoising Framework for Downstream Acoustic ModelsCode1
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsCode0
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR0
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the WildCode0
Soft Random Sampling: A Theoretical and Empirical Analysis0
App for Resume-Based Job Matching with Speech Interviews and Grammar Analysis: A Review0
How does end-to-end speech recognition training impact speech enhancement artifacts?0
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding0
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition0
Multi-channel Conversational Speaker Separation via Neural Diarization0
Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer0
Show:102550
← PrevPage 15 of 64Next →

No leaderboard results yet.