IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Feb 8, 2025 Decoder Language Modeling
Code Code Available 115 Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Mar 3, 2025 Attribute text-to-speech
Code Code Available 115 CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 115 FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 115 OpenVoice: Versatile Instant Voice Cloning Dec 3, 2023 Rhythm Voice Cloning
Code Code Available 75 Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction Feb 17, 2025 Instruction Following Voice Cloning
Code Code Available 75 ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Nov 7, 2022 Representation Learning Speech Representation Learning
Code Code Available 65 Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 45 Proactive Detection of Voice Cloning with Localized Watermarking Jan 30, 2024 Voice Cloning
Code Code Available 45 Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Jul 9, 2019 Speech Synthesis text-to-speech
Code Code Available 35 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Feb 18, 2025 Voice Cloning
Code Code Available 35 EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control Oct 1, 2024 Emotional Speech Synthesis Speech Synthesis
Code Code Available 25 StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing Feb 20, 2024 Voice Cloning
Code Code Available 25 Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis Jun 6, 2024 Decoder Inductive Bias
Code Code Available 25 Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 25 Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss Apr 22, 2021 Voice Cloning Voice Conversion
Code Code Available 15 Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques Aug 5, 2023 Quantization Speaker anonymization
Code Code Available 15 Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text Jun 26, 2021 Talking Face Generation Talking Head Generation
Code Code Available 15 XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Jun 7, 2024 text-to-speech Text to Speech
Code Code Available 15 One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Aug 3, 2020 Meta-Learning Speech Synthesis
Code Code Available 15 Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features Jul 15, 2023 Voice Cloning
Code Code Available 15 LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation Sep 23, 2024 Language Modeling Language Modelling
Code Code Available 15 Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis Oct 14, 2022 Speech Synthesis Voice Cloning
Code Code Available 05 Dictionary Attacks on Speaker Verification Apr 24, 2022 Speaker Verification Voice Cloning
Code Code Available 05 WavLM model ensemble for audio deepfake detection Aug 14, 2024 Audio Deepfake Detection Data Augmentation
Code Code Available 05 Is Audio Spoof Detection Robust to Laundering Attacks? Aug 27, 2024 Voice Cloning
Code Code Available 05 Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech Mar 6, 2021 text-to-speech Text to Speech
Code Code Available 05 SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development Mar 31, 2025 Speech Synthesis Voice Cloning
Code Code Available 05 Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Jun 12, 2018 Speaker Verification Speech Synthesis
Code Code Available 05 SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines Nov 6, 2021 Disentanglement Speaker Verification
Code Code Available 05 Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes May 29, 2025 Audio Deepfake Detection DeepFake Detection
Code Code Available 05 PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset May 14, 2024 DeepFake Detection Face Swapping
Code Code Available 05 Neural Voice Cloning with a Few Samples Feb 14, 2018 Speech Synthesis Voice Cloning
Code Code Available 05 ClonEval: An Open Voice Cloning Benchmark Apr 29, 2025 text-to-speech Text to Speech
Code Code Available 05 Low-Resource Multilingual and Zero-Shot Multispeaker TTS Oct 21, 2022 Meta-Learning text-to-speech
Code Code Available 05 Discovery of Single Independent Latent Variable Oct 12, 2021 Image Generation Voice Cloning
Code Code Available 05 Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis Apr 10, 2025 Speech Synthesis text-to-speech
— Unverified 00 Can DeepFake Speech be Reliably Detected? Oct 9, 2024 Face Swapping Misinformation
— Unverified 00 Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language Aug 19, 2024 Transfer Learning Voice Cloning
— Unverified 00 DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing Jun 13, 2024 Language Modeling Language Modelling
— Unverified 00 DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis Oct 14, 2024 Denoising Speaker Verification
— Unverified 00 Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection May 22, 2025 DeepFake Detection Face Swapping
— Unverified 00 Latent linguistic embedding for cross-lingual text-to-speech and voice conversion Oct 8, 2020 text-to-speech Text to Speech
— Unverified 00 Deepfake Technology Unveiled: The Commoditization of AI and Its Impact on Digital Trust Jan 24, 2025 Face Swapping Misinformation
— Unverified 00 Augmentation through Laundering Attacks for Audio Spoof Detection Oct 1, 2024 Data Augmentation Face Swapping
— Unverified 00 Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset Dec 25, 2024 text-to-speech Text to Speech
— Unverified 00 Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices Jun 11, 2024 Ethics Fairness
— Unverified 00 "It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services Apr 12, 2025 Voice Cloning
— Unverified 00 De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks Jul 3, 2025 Voice Cloning
— Unverified 00 Data Efficient Voice Cloning for Neural Singing Synthesis Feb 19, 2019 text-to-speech Text to Speech
— Unverified 00