Inclusivity of AI Speech in Healthcare: A Decade Look Back May 15, 2025 speech-recognition Speech Recognition
— Unverified 0Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio May 15, 2025 Speaker Identification speech-recognition
— Unverified 0Full simulation on the dynamics of auditory synaptic fusion: Strong clustering of calcium channel might be the origin of the coherent release in the auditory hair cells May 12, 2025 speech-recognition Speech Recognition
— Unverified 0Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients May 9, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations May 8, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement May 7, 2025 Robust Speech Recognition Speech Enhancement
— Unverified 0SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer May 7, 2025 Audio-Visual Speech Recognition Lip Reading
— Unverified 0Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization May 6, 2025 Active Speaker Detection Audio-Visual Speech Recognition
Code Code Available 2VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play May 5, 2025 AI Agent Automatic Speech Recognition
Code Code Available 3Transforming faces into video stories -- VideoFace2.0 May 4, 2025 Face Detection Face Recognition
Code Code Available 0A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction May 4, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments May 2, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction Apr 30, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition Apr 30, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Kimi-Audio Technical Report Apr 25, 2025 Audio Question Answering Question Answering
Code Code Available 7TinyML for Speech Recognition Apr 22, 2025 speech-recognition Speech Recognition
Code Code Available 0Development and evaluation of a deep learning algorithm for German word recognition from lip movements Apr 22, 2025 Lip Reading speech-recognition
— Unverified 0Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides Apr 21, 2025 Audio-Visual Speech Recognition Automatic Speech Recognition
— Unverified 0StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models Apr 21, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope Apr 17, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning Apr 16, 2025 Arabic Speech Recognition Automatic Speech Recognition
— Unverified 0Dysarthria Normalization via Local Lie Group Transformations for Robust ASR Apr 16, 2025 Robust Speech Recognition speech-recognition
Code Code Available 0Spatial Audio Processing with Large Language Model on Wearable Devices Apr 11, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Summarizing Speech: A Comprehensive Survey Apr 10, 2025 Meeting Summarization speech-recognition
— Unverified 0RNN-Transducer-based Losses for Speech Recognition on Noisy Targets Apr 9, 2025 speech-recognition Speech Recognition
Code Code Available 0Visual-Aware Speech Recognition for Noisy Scenarios Apr 9, 2025 Audio-Visual Speech Recognition Automatic Speech Recognition
— Unverified 0DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation Apr 7, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations Apr 4, 2025 speech-recognition Speech Recognition
— Unverified 0Edge Intelligence for Wildlife Conservation: Real-Time Hornbill Call Classification Using TinyML Apr 3, 2025 Edge-computing speech-recognition
— Unverified 0LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect Apr 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Chain of Correction for Full-text Speech Recognition with Large Language Models Apr 2, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Transformer-Based Named Entity Recognition for Automated Server Provisioning Apr 1, 2025 named-entity-recognition Named Entity Recognition
Code Code Available 0Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems Apr 1, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection Mar 31, 2025 Fraud Detection Large Language Model
Code Code Available 2The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR Mar 30, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages Mar 30, 2025 Automatic Speech Recognition Language Modeling
Code Code Available 1Scaling Auditory Cognition via Test-Time Compute in Audio Language Models Mar 30, 2025 speech-recognition Speech Recognition
— Unverified 0A 71.2-μW Speech Recognition Accelerator with Recurrent Spiking Neural Network Mar 27, 2025 Quantization speech-recognition
— Unverified 0VALLR: Visual ASR Language Model for Lip Reading Mar 27, 2025 Automatic Speech Recognition Language Modeling
— Unverified 0Efficient First-Order Optimization on the Pareto Set for Multi-Objective Learning under Preference Guidance Mar 26, 2025 Bilevel Optimization Fairness
— Unverified 0Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit Mar 26, 2025 speech-recognition Speech Recognition
— Unverified 0FinAudio: A Benchmark for Audio Large Language Models in Financial Applications Mar 26, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Qwen2.5-Omni Technical Report Mar 26, 2025 Automatic Speech Recognition (ASR) GSM8K
Code Code Available 7Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages Mar 26, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy Mar 25, 2025 Benchmarking speech-recognition
— Unverified 0Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization Mar 25, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Coverage-Guaranteed Speech Emotion Recognition via Calibrated Uncertainty-Adaptive Prediction Sets Mar 24, 2025 Conformal Prediction Emotion Recognition
— Unverified 0