WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models Feb 20, 2025 Automatic Speech Recognition RAG
— Unverified 0Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks Feb 19, 2025 Automatic Speech Recognition speech-recognition
Code Code Available 0Adopting Whisper for Confidence Estimation Feb 19, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders Feb 18, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Neuro-oscillatory models of cortical speech processing Feb 18, 2025 speech-recognition Speech Recognition
— Unverified 0On the Robust Approximation of ASR Metrics Feb 18, 2025 speech-recognition Speech Recognition
— Unverified 0Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models Feb 18, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization Feb 18, 2025 Automatic Speech Recognition Speaker Identification
— Unverified 0Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics Feb 18, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing Feb 17, 2025 Lip to Speech Synthesis speech-recognition
— Unverified 0OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models Feb 14, 2025 speech-recognition Speech Recognition
— Unverified 0Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge Feb 14, 2025 Action Detection Activity Detection
— Unverified 0A Preliminary Exploration with GPT-4o Voice Mode Feb 14, 2025 Age Classification Audio Deepfake Detection
— Unverified 0MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems Feb 14, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Shortcut Learning Susceptibility in Vision Classifiers Feb 13, 2025 speech-recognition Speech Recognition
— Unverified 0Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors Feb 12, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition Feb 11, 2025 Audio-Visual Speech Recognition Computational Efficiency
— Unverified 0Speech to Speech Translation with Translatotron: A State of the Art Review Feb 9, 2025 speech-recognition Speech Recognition
— Unverified 0Lightweight Operations for Visual Speech Recognition Feb 7, 2025 speech-recognition Speech Recognition
— Unverified 0Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance Feb 7, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance Feb 7, 2025 Automatic Speech Recognition Decoder
— Unverified 0Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers Feb 6, 2025 Automatic Speech Recognition Decoder
— Unverified 0Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond Feb 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling Feb 5, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models Feb 3, 2025 Audio-Visual Speech Recognition speech-recognition
— Unverified 0Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition Feb 3, 2025 Automatic Speech Recognition backdoor defense
— Unverified 0CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition Feb 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport Feb 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions Feb 1, 2025 Lipreading speech-recognition
Code Code Available 0Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition Feb 1, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation Feb 1, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions Jan 31, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Language Bias in Self-Supervised Learning For Automatic Speech Recognition Jan 31, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition Jan 31, 2025 Contrastive Learning Diversity
— Unverified 0Privacy-Preserving Edge Speech Understanding with Tiny Foundation Models Jan 29, 2025 Privacy Preserving Robust Speech Recognition
— Unverified 0Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition Jan 29, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0SCDiar: a streaming diarization system based on speaker change detection and speech recognition Jan 28, 2025 Change Detection speaker-diarization
— Unverified 0RDMM: Fine-Tuned LLM Models for On-Device Robotic Decision Making with Enhanced Contextual Awareness in Specific Domains Jan 28, 2025 Decision Making speech-recognition
Code Code Available 0Classification Error Bound for Low Bayes Error Conditions in Machine Learning Jan 27, 2025 Automatic Speech Recognition Classification
— Unverified 0End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario Jan 26, 2025 Decoder speech-recognition
— Unverified 0SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation Jan 26, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition Jan 25, 2025 speech-recognition Speech Recognition
— Unverified 0The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders? Jan 25, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Speech Translation Refinement using Large Language Models Jan 25, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0LoCoML: A Framework for Real-World ML Inference Pipelines Jan 24, 2025 Automatic Speech Recognition Machine Translation
— Unverified 0Learning-based A Posteriori Speech Presence Probability Estimation and Applications Jan 23, 2025 Speech Enhancement speech-recognition
— Unverified 0Integrating Persian Lip Reading in Surena-V Humanoid Robot for Human-Robot Interaction Jan 23, 2025 Landmark Tracking Lip Reading
— Unverified 0DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition Jan 23, 2025 Quantization Representation Learning
— Unverified 0Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing Jan 23, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Development of an Inclusive Educational Platform Using Open Technologies and Machine Learning: A Case Study on Accessibility Enhancement Jan 22, 2025 Object Recognition speech-recognition
— Unverified 0