Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Mar 3, 2025 Attribute text-to-speech
Code Code Available 11IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Feb 8, 2025 Decoder Language Modeling
Code Code Available 11CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 11FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 11Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction Feb 17, 2025 Instruction Following Voice Cloning
Code Code Available 7OpenVoice: Versatile Instant Voice Cloning Dec 3, 2023 Rhythm Voice Cloning
Code Code Available 7ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Nov 7, 2022 Representation Learning Speech Representation Learning
Code Code Available 6Proactive Detection of Voice Cloning with Localized Watermarking Jan 30, 2024 Voice Cloning
Code Code Available 4Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 4SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Feb 18, 2025 Voice Cloning
Code Code Available 3Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Jul 9, 2019 Speech Synthesis text-to-speech
Code Code Available 3Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 2EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control Oct 1, 2024 Emotional Speech Synthesis Speech Synthesis
Code Code Available 2Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis Jun 6, 2024 Decoder Inductive Bias
Code Code Available 2StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing Feb 20, 2024 Voice Cloning
Code Code Available 2LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation Sep 23, 2024 Language Modeling Language Modelling
Code Code Available 1XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Jun 7, 2024 text-to-speech Text to Speech
Code Code Available 1Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques Aug 5, 2023 Quantization Speaker anonymization
Code Code Available 1Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features Jul 15, 2023 Voice Cloning
Code Code Available 1Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text Jun 26, 2021 Talking Face Generation Talking Head Generation
Code Code Available 1Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss Apr 22, 2021 Voice Cloning Voice Conversion
Code Code Available 1One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Aug 3, 2020 Meta-Learning Speech Synthesis
Code Code Available 1Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison Jul 15, 2025 Voice Cloning
— Unverified 0De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks Jul 3, 2025 Voice Cloning
— Unverified 0Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes May 29, 2025 Audio Deepfake Detection DeepFake Detection
Code Code Available 0Voice Adaptation for Swiss German May 28, 2025 Voice Cloning
— Unverified 0VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents May 27, 2025 Voice Cloning
— Unverified 0Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages May 27, 2025 Synthetic Data Generation Voice Cloning
— Unverified 0CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning May 25, 2025 text-to-speech Text to Speech
— Unverified 0Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection May 22, 2025 DeepFake Detection Face Swapping
— Unverified 0MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling May 21, 2025 Emotion Recognition Face Detection
— Unverified 0VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning May 18, 2025 Representation Learning Voice Cloning
— Unverified 0MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder May 12, 2025 text-to-speech Text to Speech
— Unverified 0Voice Cloning: Comprehensive Survey May 1, 2025 Survey Voice Cloning
— Unverified 0ClonEval: An Open Voice Cloning Benchmark Apr 29, 2025 text-to-speech Text to Speech
Code Code Available 0"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services Apr 12, 2025 Voice Cloning
— Unverified 0Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis Apr 10, 2025 Speech Synthesis text-to-speech
— Unverified 0SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development Mar 31, 2025 Speech Synthesis Voice Cloning
Code Code Available 0SoK: How Robust is Audio Watermarking in Generative AI models? Mar 24, 2025 Voice Cloning
— Unverified 0Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology Mar 3, 2025 Speech Synthesis Voice Cloning
— Unverified 0Steganography Beyond Space-Time with Chain of Multimodal AI Feb 25, 2025 Face Swapping Text Generation
— Unverified 0Deepfake Technology Unveiled: The Commoditization of AI and Its Impact on Digital Trust Jan 24, 2025 Face Swapping Misinformation
— Unverified 0Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement Jan 15, 2025 Computational Efficiency CPU
— Unverified 0MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model Jan 10, 2025 Decoder Language Modelling
— Unverified 0Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset Dec 25, 2024 text-to-speech Text to Speech
— Unverified 0Speech Watermarking with Discrete Intermediate Representations Dec 18, 2024 Voice Cloning
— Unverified 0Parallel Stacked Aggregated Network for Voice Authentication in IoT-Enabled Smart Devices Nov 29, 2024 Voice Anti-spoofing Voice Cloning
— Unverified 0Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset Nov 23, 2024 DeepFake Detection Face Swapping
— Unverified 0The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings Oct 31, 2024 Voice Cloning
— Unverified 0DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis Oct 14, 2024 Denoising Speaker Verification
— Unverified 0