Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics Feb 18, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders Feb 18, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Neuro-oscillatory models of cortical speech processing Feb 18, 2025 speech-recognition Speech Recognition
— Unverified 0NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing Feb 17, 2025 Lip to Speech Synthesis speech-recognition
— Unverified 0DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities Feb 16, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge Feb 14, 2025 Action Detection Activity Detection
— Unverified 0A Preliminary Exploration with GPT-4o Voice Mode Feb 14, 2025 Age Classification Audio Deepfake Detection
— Unverified 0MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems Feb 14, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models Feb 14, 2025 speech-recognition Speech Recognition
— Unverified 0Shortcut Learning Susceptibility in Vision Classifiers Feb 13, 2025 speech-recognition Speech Recognition
— Unverified 0Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors Feb 12, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition Feb 11, 2025 Audio-Visual Speech Recognition Computational Efficiency
— Unverified 0VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification Feb 11, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Speech to Speech Translation with Translatotron: A State of the Art Review Feb 9, 2025 speech-recognition Speech Recognition
— Unverified 0Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models Feb 9, 2025 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 1Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance Feb 7, 2025 Automatic Speech Recognition Decoder
— Unverified 0Lightweight Operations for Visual Speech Recognition Feb 7, 2025 speech-recognition Speech Recognition
— Unverified 0Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance Feb 7, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers Feb 6, 2025 Automatic Speech Recognition Decoder
— Unverified 0Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond Feb 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling Feb 5, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition Feb 3, 2025 Audio-Visual Speech Recognition Decoder
Code Code Available 3Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models Feb 3, 2025 Audio-Visual Speech Recognition speech-recognition
— Unverified 0CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition Feb 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition Feb 3, 2025 Automatic Speech Recognition backdoor defense
— Unverified 0A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport Feb 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions Feb 1, 2025 Lipreading speech-recognition
Code Code Available 0When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation Feb 1, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language Feb 1, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition Feb 1, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions Jan 31, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition Jan 31, 2025 Contrastive Learning Diversity
— Unverified 0Language Bias in Self-Supervised Learning For Automatic Speech Recognition Jan 31, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Privacy-Preserving Edge Speech Understanding with Tiny Foundation Models Jan 29, 2025 Privacy Preserving Robust Speech Recognition
— Unverified 0Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition Jan 29, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0SCDiar: a streaming diarization system based on speaker change detection and speech recognition Jan 28, 2025 Change Detection speaker-diarization
— Unverified 0RDMM: Fine-Tuned LLM Models for On-Device Robotic Decision Making with Enhanced Contextual Awareness in Specific Domains Jan 28, 2025 Decision Making speech-recognition
Code Code Available 0Classification Error Bound for Low Bayes Error Conditions in Machine Learning Jan 27, 2025 Automatic Speech Recognition Classification
— Unverified 0SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation Jan 26, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario Jan 26, 2025 Decoder speech-recognition
— Unverified 0The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders? Jan 25, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Speech Translation Refinement using Large Language Models Jan 25, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition Jan 25, 2025 speech-recognition Speech Recognition
— Unverified 0FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration Jan 24, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 5LoCoML: A Framework for Real-World ML Inference Pipelines Jan 24, 2025 Automatic Speech Recognition Machine Translation
— Unverified 0Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation Jan 23, 2025 Audio-Visual Speech Recognition Multi-Task Learning
Code Code Available 1OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia Jan 23, 2025 Emotion Recognition Event Detection
Code Code Available 3Learning-based A Posteriori Speech Presence Probability Estimation and Applications Jan 23, 2025 Speech Enhancement speech-recognition
— Unverified 0Integrating Persian Lip Reading in Surena-V Humanoid Robot for Human-Robot Interaction Jan 23, 2025 Landmark Tracking Lip Reading
— Unverified 0DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition Jan 23, 2025 Quantization Representation Learning
— Unverified 0