S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation Jun 11, 2025 Reading Comprehension Speech Synthesis
— Unverified 0Seeing Voices: Generating A-Roll Video from Audio with Mirage Jun 9, 2025 Speech Synthesis text-to-speech
— Unverified 0HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset Jun 4, 2025 Speech Synthesis text-to-speech
— Unverified 0A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions Jun 4, 2025 Data Augmentation Diversity
— Unverified 0Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions Jun 3, 2025 Expressive Speech Synthesis Prompt Learning
— Unverified 0CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech Jun 3, 2025 Speech Synthesis text-to-speech
— Unverified 0SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction Jun 2, 2025 Speech Synthesis text-to-speech
— Unverified 0Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models Jun 1, 2025 counterfactual Speech Synthesis
— Unverified 0Chain-of-Thought Training for Open E2E Spoken Dialogue Systems May 31, 2025 Language Modeling Language Modelling
— Unverified 0BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models May 28, 2025 Speech Synthesis
— Unverified 0ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis May 26, 2025 DeepFake Detection Face Swapping
— Unverified 0DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech May 26, 2025 Attribute Emotional Speech Synthesis
— Unverified 0Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling May 26, 2025 Sentence Speech Synthesis
— Unverified 0GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor May 26, 2025 Speech Synthesis
— Unverified 0Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis May 25, 2025 Speech Synthesis text-to-speech
— Unverified 0RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations May 24, 2025 Expressive Speech Synthesis Speech Synthesis
— Unverified 0Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding May 21, 2025 Speech Synthesis
— Unverified 0MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling May 21, 2025 Emotion Recognition Face Detection
— Unverified 0Articulatory Feature Prediction from Surface EMG during Speech Production May 20, 2025 Electromyography (EMG) Speech Synthesis
Code Code Available 0FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation May 20, 2025 Dataset Generation Speech Synthesis
— Unverified 0Pairwise Evaluation of Accent Similarity in Speech Synthesis May 20, 2025 Speech Synthesis
— Unverified 0OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching May 19, 2025 Attribute Speech Synthesis
— Unverified 0RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations May 19, 2025 Speaker Verification Speech Enhancement
— Unverified 0Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis May 18, 2025 Speech Synthesis text-to-speech
— Unverified 0UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech May 15, 2025 Emotional Speech Synthesis Language Modeling
— Unverified 0DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis May 14, 2025 Audio Generation Audio Synthesis
— Unverified 0Investigating self-supervised features for expressive, multilingual voice conversion May 13, 2025 Self-Supervised Learning Speech Synthesis
— Unverified 0Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications May 12, 2025 Speech Synthesis text-to-speech
— Unverified 0AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation Apr 29, 2025 In-Context Learning Speech Synthesis
— Unverified 0Towards Flow-Matching-based TTS without Classifier-Free Guidance Apr 29, 2025 Speech Synthesis text-to-speech
— Unverified 0Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements Apr 27, 2025 Generative Adversarial Network Speech Synthesis
— Unverified 0FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning Apr 22, 2025 Deep Learning Speaker Verification
— Unverified 0A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models Apr 22, 2025 cross-modal alignment Script Generation
— Unverified 0SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation Apr 21, 2025 parameter-efficient fine-tuning Speech Synthesis
— Unverified 0DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue Apr 20, 2025 Diversity Speech Synthesis
Code Code Available 0Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion Apr 18, 2025 Generative Adversarial Network Image Generation
— Unverified 0Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis Apr 14, 2025 Language Modeling Language Modelling
— Unverified 0AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis Apr 14, 2025 RAG Retrieval-augmented Generation
— Unverified 0AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis Apr 12, 2025 Speech Synthesis
— Unverified 0SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow Apr 10, 2025 Speech Synthesis text-to-speech
— Unverified 0Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis Apr 10, 2025 Speech Synthesis text-to-speech
— Unverified 0VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models Apr 3, 2025 Speech Synthesis
— Unverified 0SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development Mar 31, 2025 Speech Synthesis Voice Cloning
Code Code Available 0SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System Mar 29, 2025 Speech Synthesis text-to-speech
— Unverified 0From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech Mar 21, 2025 Speech Synthesis
— Unverified 0DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility Mar 7, 2025 Speech Synthesis
Code Code Available 0Good practices for evaluation of synthesized speech Mar 5, 2025 Speech Synthesis
— Unverified 0Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology Mar 3, 2025 Speech Synthesis Voice Cloning
— Unverified 0DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models Feb 27, 2025 Diversity Language Modeling
— Unverified 0MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Feb 26, 2025 Speech Synthesis text-to-speech
— Unverified 0