SOTAVerified

Automatic Speech Recognition

Papers

Showing 51100 of 3174 papers

TitleStatusHype
NusaCrowd: Open Source Initiative for Indonesian NLP ResourcesCode2
BLASER: A Text-Free Speech-to-Speech Translation Evaluation MetricCode2
Towards A Unified Conformer Structure: from ASR to ASV TaskCode2
CMGAN: Conformer-Based Metric-GAN for Monaural Speech EnhancementCode2
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningCode2
Squeezeformer: An Efficient Transformer for Automatic Speech RecognitionCode2
4-bit Conformer with Native Quantization Aware Training for Speech RecognitionCode2
CMGAN: Conformer-based Metric GAN for Speech EnhancementCode2
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionCode2
Robust Self-Supervised Audio-Visual Speech RecognitionCode2
Fast Transformers with Clustered AttentionCode2
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across ModalitiesCode1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech RecognitionCode1
Whisper-LM: Improving ASR Models with Language Models for Low-Resource LanguagesCode1
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming CapabilitiesCode1
VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR IdentificationCode1
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsCode1
Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo LanguageCode1
FlanEC: Exploring Flan-T5 for Post-ASR Error CorrectionCode1
Large Language Models Are Read/Write Policy-Makers for Simultaneous GenerationCode1
MathSpeech: Leveraging Small LMs for Accurate Conversion in Mathematical Speech-to-FormulaCode1
XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack DetectionCode1
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-AttentionCode1
VHASR: A Multimodal Speech Recognition System With Vision HotwordsCode1
Mamba for Streaming ASR Combined with Unimodal AggregationCode1
SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion RecognitionCode1
LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech RecognitionCode1
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic FeaturesCode1
Evolutionary Prompt Design for LLM-Based Post-ASR Error CorrectionCode1
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for PolishCode1
Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation ModelsCode1
Improving Self-supervised Pre-training using Accent-Specific CodebooksCode1
Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language ModelsCode1
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
Towards Building an End-to-End Multilingual Automatic Lyrics Transcription ModelCode1
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNetCode1
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy SpeechCode1
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech RecognitionCode1
A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and RecognitionCode1
SoccerNet-Echoes: A Soccer Game Audio Commentary DatasetCode1
Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation ModelsCode1
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source DatasetsCode1
Less Peaky and More Accurate CTC Forced Alignment by Label PriorsCode1
Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in SenegalCode1
Speech Robust Bench: A Robustness Benchmark For Speech RecognitionCode1
Language and Speech Technology for Central Kurdish VarietiesCode1
A Cross-Modal Approach to Silent Speech with LLM-Enhanced RecognitionCode1
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech RecognitionCode1
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASRCode1
Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free MetricCode1
Show:102550
← PrevPage 2 of 64Next →

No leaderboard results yet.