NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech Jul 17, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis Jul 8, 2025 Data Augmentation Mixture-of-Experts
— Unverified 0A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting Jul 6, 2025 Hybrid Machine Learning speech-recognition
— Unverified 0DeepGesture: A conversational gesture synthesis system based on emotions and semantics Jul 3, 2025 Gesture Generation Motion Synthesis
Code Code Available 0OpusLM: A Family of Open Unified Speech Language Models Jun 21, 2025 Decoder speech-recognition
— Unverified 0RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching Jun 20, 2025 Scheduling Speech Synthesis
Code Code Available 2InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems Jun 19, 2025 Benchmarking Descriptive
Code Code Available 1An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW Jun 18, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models Jun 17, 2025 Kolmogorov-Arnold Networks Self-Supervised Learning
Code Code Available 0ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Jun 16, 2025 Decoder Speech Synthesis
Code Code Available 4From Flat to Feeling: A Feasibility and Impact Study on Dynamic Facial Emotions in AI-Generated Avatars Jun 16, 2025 GPU Speech Synthesis
— Unverified 0S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation Jun 11, 2025 Reading Comprehension Speech Synthesis
— Unverified 0UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching Jun 11, 2025 Speech Synthesis text-to-speech
— Unverified 0OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment Jun 11, 2025 cross-modal alignment Question Answering
Code Code Available 0Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Jun 10, 2025 Language Modeling Language Modelling
Code Code Available 7Seeing Voices: Generating A-Roll Video from Audio with Mirage Jun 9, 2025 Speech Synthesis text-to-speech
— Unverified 0HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset Jun 4, 2025 Speech Synthesis text-to-speech
— Unverified 0A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions Jun 4, 2025 Data Augmentation Diversity
— Unverified 0Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions Jun 3, 2025 Expressive Speech Synthesis Prompt Learning
— Unverified 0CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech Jun 3, 2025 Speech Synthesis text-to-speech
— Unverified 0SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction Jun 2, 2025 Speech Synthesis text-to-speech
— Unverified 0Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models Jun 1, 2025 counterfactual Speech Synthesis
— Unverified 0Chain-of-Thought Training for Open E2E Spoken Dialogue Systems May 31, 2025 Language Modeling Language Modelling
— Unverified 0BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models May 28, 2025 Speech Synthesis
— Unverified 0ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis May 26, 2025 DeepFake Detection Face Swapping
— Unverified 0Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling May 26, 2025 Sentence Speech Synthesis
— Unverified 0GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor May 26, 2025 Speech Synthesis
— Unverified 0DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech May 26, 2025 Attribute Emotional Speech Synthesis
— Unverified 0Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis May 25, 2025 Speech Synthesis text-to-speech
— Unverified 0RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations May 24, 2025 Expressive Speech Synthesis Speech Synthesis
— Unverified 0CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training May 23, 2025 Automatic Speech Recognition Emotion Recognition
Code Code Available 11MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling May 21, 2025 Emotion Recognition Face Detection
— Unverified 0Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding May 21, 2025 Speech Synthesis
— Unverified 0Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models May 21, 2025 Bayesian Optimization Speech Synthesis
Code Code Available 1Pairwise Evaluation of Accent Similarity in Speech Synthesis May 20, 2025 Speech Synthesis
— Unverified 0FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation May 20, 2025 Dataset Generation Speech Synthesis
— Unverified 0Articulatory Feature Prediction from Surface EMG during Speech Production May 20, 2025 Electromyography (EMG) Speech Synthesis
Code Code Available 0Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space May 19, 2025 Language Modeling Language Modelling
Code Code Available 2RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations May 19, 2025 Speaker Verification Speech Enhancement
— Unverified 0OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching May 19, 2025 Attribute Speech Synthesis
— Unverified 0Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis May 18, 2025 Speech Synthesis text-to-speech
— Unverified 0UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech May 15, 2025 Emotional Speech Synthesis Language Modeling
— Unverified 0DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis May 14, 2025 Audio Generation Audio Synthesis
— Unverified 0Investigating self-supervised features for expressive, multilingual voice conversion May 13, 2025 Self-Supervised Learning Speech Synthesis
— Unverified 0Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications May 12, 2025 Speech Synthesis text-to-speech
— Unverified 0LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis May 5, 2025 Chatbot Decoder
Code Code Available 3Towards Flow-Matching-based TTS without Classifier-Free Guidance Apr 29, 2025 Speech Synthesis text-to-speech
— Unverified 0AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation Apr 29, 2025 In-Context Learning Speech Synthesis
— Unverified 0Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements Apr 27, 2025 Generative Adversarial Network Speech Synthesis
— Unverified 0FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning Apr 22, 2025 Deep Learning Speaker Verification
— Unverified 0