A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models Apr 22, 2025 cross-modal alignment Script Generation
— Unverified 0SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation Apr 21, 2025 parameter-efficient fine-tuning Speech Synthesis
— Unverified 0DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue Apr 20, 2025 Diversity Speech Synthesis
Code Code Available 0Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion Apr 18, 2025 Generative Adversarial Network Image Generation
— Unverified 0Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis Apr 14, 2025 Language Modeling Language Modelling
— Unverified 0AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis Apr 14, 2025 RAG Retrieval-augmented Generation
— Unverified 0SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis Apr 14, 2025 Face Swapping Speech Synthesis
Code Code Available 1AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis Apr 12, 2025 Speech Synthesis
— Unverified 0Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis Apr 10, 2025 Speech Synthesis text-to-speech
— Unverified 0SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow Apr 10, 2025 Speech Synthesis text-to-speech
— Unverified 0VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models Apr 3, 2025 Speech Synthesis
— Unverified 0SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development Mar 31, 2025 Speech Synthesis Voice Cloning
Code Code Available 0SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System Mar 29, 2025 Speech Synthesis text-to-speech
— Unverified 0From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech Mar 21, 2025 Speech Synthesis
— Unverified 0WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching Mar 20, 2025 Speech Synthesis
Code Code Available 2MoonCast: High-Quality Zero-Shot Podcast Generation Mar 18, 2025 Speech Synthesis text-to-speech
Code Code Available 3DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility Mar 7, 2025 Speech Synthesis
Code Code Available 0Good practices for evaluation of synthesized speech Mar 5, 2025 Speech Synthesis
— Unverified 0Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology Mar 3, 2025 Speech Synthesis Voice Cloning
— Unverified 0PodAgent: A Comprehensive Framework for Podcast Generation Mar 1, 2025 Audio Generation Speech Synthesis
Code Code Available 2DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models Feb 27, 2025 Diversity Language Modeling
— Unverified 0MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Feb 26, 2025 Speech Synthesis text-to-speech
— Unverified 0Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM Feb 24, 2025 Automatic Speech Recognition Language Modeling
— Unverified 0AV-Flow: Transforming Text to Audio-Visual Human-like Interactions Feb 18, 2025 Speech Synthesis
— Unverified 0High-Fidelity Music Vocoder using Neural Audio Codecs Feb 18, 2025 Decoder Speech Synthesis
— Unverified 0NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing Feb 17, 2025 Lip to Speech Synthesis speech-recognition
— Unverified 0A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond Feb 17, 2025 Contrastive Learning EEG
— Unverified 0FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching Feb 16, 2025 Language Modeling Language Modelling
— Unverified 0ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech Feb 13, 2025 Adversarial Attack Adversarial Attack Detection
— Unverified 0LoRP-TTS: Low-Rank Personalized Text-To-Speech Feb 11, 2025 Speech Synthesis text-to-speech
— Unverified 0Non-invasive electromyographic speech neuroprosthesis: a geometric perspective Feb 9, 2025 Speech Synthesis
— Unverified 0Gender Bias in Instruction-Guided Speech Synthesis Models Feb 8, 2025 Expressive Speech Synthesis Speech Synthesis
— Unverified 0Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Feb 6, 2025 Speech Synthesis
Code Code Available 4Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet Feb 4, 2025 Speech Synthesis text-to-speech
Code Code Available 1Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis Feb 3, 2025 Quantization Speech Synthesis
— Unverified 0Compact Neural TTS Voices for Accessibility Jan 28, 2025 Speech Synthesis text-to-speech
— Unverified 0Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation Jan 24, 2025 Audio Deepfake Detection DeepFake Detection
— Unverified 0Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement Jan 23, 2025 Data Augmentation Speech Enhancement
— Unverified 0A Non-autoregressive Model for Joint STT and TTS Jan 15, 2025 Automatic Speech Recognition speech-recognition
— Unverified 0Speech Synthesis along Perceptual Voice Quality Dimensions Jan 15, 2025 Expressive Speech Synthesis Speech Synthesis
— Unverified 0Exploring the encoding of linguistic representations in the Fully-Connected Layer of generative CNNs for Speech Jan 13, 2025 Speech Synthesis
— Unverified 0Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis Jan 11, 2025 Attribute Benchmarking
Code Code Available 1PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control Jan 10, 2025 Speech Synthesis text-to-speech
— Unverified 0Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron Jan 10, 2025 Speech Synthesis text-to-speech
— Unverified 0TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer Jan 10, 2025 speech-recognition Speech Recognition
— Unverified 0AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder Jan 9, 2025 Pitch Classification Pitch control
Code Code Available 1Probing Speaker-specific Features in Speaker Representations Jan 9, 2025 Self-Supervised Learning Speaker Verification
— Unverified 0JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis Jan 9, 2025 Emotion Recognition Language Modeling
— Unverified 0FleSpeech: Flexibly Controllable Speech Generation with Various Prompts Jan 8, 2025 Speech Synthesis
— Unverified 0OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis Jan 8, 2025 Decoder Emotional Speech Synthesis
Code Code Available 2