Voice Cloning

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 112 papers

Title	Date	Tasks	Status	Hype	Score
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs	Jul 4, 2024	Emotion RecognitionEvent Detection	CodeCode Available	11	5
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System	Feb 8, 2025	DecoderLanguage Modeling	CodeCode Available	11	5
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens	Jul 7, 2024	Language ModellingLarge Language Model	CodeCode Available	11	5
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens	Mar 3, 2025	Attributetext-to-speech	CodeCode Available	11	5
OpenVoice: Versatile Instant Voice Cloning	Dec 3, 2023	RhythmVoice Cloning	CodeCode Available	7	5
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction	Feb 17, 2025	Instruction FollowingVoice Cloning	CodeCode Available	7	5
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech	Nov 7, 2022	Representation LearningSpeech Representation Learning	CodeCode Available	6	5
Proactive Detection of Voice Cloning with Localized Watermarking	Jan 30, 2024	Voice Cloning	CodeCode Available	4	5
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert	Apr 18, 2023	Audio GenerationExpressive Speech Synthesis	CodeCode Available	4	5
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation	Feb 18, 2025	Voice Cloning	CodeCode Available	3	5
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning	Jul 9, 2019	Speech Synthesistext-to-speech	CodeCode Available	3	5
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing	Feb 20, 2024	Voice Cloning	CodeCode Available	2	5
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control	Oct 1, 2024	Emotional Speech SynthesisSpeech Synthesis	CodeCode Available	2	5
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis	Jun 6, 2024	DecoderInductive Bias	CodeCode Available	2	5
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis	Oct 30, 2024	Speech Synthesistext-to-speech	CodeCode Available	2	5
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques	Aug 5, 2023	QuantizationSpeaker anonymization	CodeCode Available	1	5
Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss	Apr 22, 2021	Voice CloningVoice Conversion	CodeCode Available	1	5
Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text	Jun 26, 2021	Talking Face GenerationTalking Head Generation	CodeCode Available	1	5
Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features	Jul 15, 2023	Voice Cloning	CodeCode Available	1	5
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech	Aug 3, 2020	Meta-LearningSpeech Synthesis	CodeCode Available	1	5
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation	Sep 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model	Jun 7, 2024	text-to-speechText to Speech	CodeCode Available	1	5
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis	Oct 14, 2022	Speech SynthesisVoice Cloning	CodeCode Available	0	5
Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes	May 29, 2025	Audio Deepfake DetectionDeepFake Detection	CodeCode Available	0	5
Dictionary Attacks on Speaker Verification	Apr 24, 2022	Speaker VerificationVoice Cloning	CodeCode Available	0	5
Is Audio Spoof Detection Robust to Laundering Attacks?	Aug 27, 2024	Voice Cloning	CodeCode Available	0	5
WavLM model ensemble for audio deepfake detection	Aug 14, 2024	Audio Deepfake DetectionData Augmentation	CodeCode Available	0	5
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	Jun 12, 2018	Speaker VerificationSpeech Synthesis	CodeCode Available	0	5
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development	Mar 31, 2025	Speech SynthesisVoice Cloning	CodeCode Available	0	5
SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines	Nov 6, 2021	DisentanglementSpeaker Verification	CodeCode Available	0	5
Discovery of Single Independent Latent Variable	Oct 12, 2021	Image GenerationVoice Cloning	CodeCode Available	0	5
ClonEval: An Open Voice Cloning Benchmark	Apr 29, 2025	text-to-speechText to Speech	CodeCode Available	0	5
Neural Voice Cloning with a Few Samples	Feb 14, 2018	Speech SynthesisVoice Cloning	CodeCode Available	0	5
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset	May 14, 2024	DeepFake DetectionFace Swapping	CodeCode Available	0	5
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis	Apr 10, 2025	Speech Synthesistext-to-speech	—Unverified	0	0
Can DeepFake Speech be Reliably Detected?	Oct 9, 2024	Face SwappingMisinformation	—Unverified	0	0
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language	Aug 19, 2024	Transfer LearningVoice Cloning	—Unverified	0	0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing	Jun 13, 2024	Language ModelingLanguage Modelling	—Unverified	0	0
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis	Oct 14, 2024	DenoisingSpeaker Verification	—Unverified	0	0
Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection	May 22, 2025	DeepFake DetectionFace Swapping	—Unverified	0	0
Latent linguistic embedding for cross-lingual text-to-speech and voice conversion	Oct 8, 2020	text-to-speechText to Speech	—Unverified	0	0
Deepfake Technology Unveiled: The Commoditization of AI and Its Impact on Digital Trust	Jan 24, 2025	Face SwappingMisinformation	—Unverified	0	0
Augmentation through Laundering Attacks for Audio Spoof Detection	Oct 1, 2024	Data AugmentationFace Swapping	—Unverified	0	0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset	Dec 25, 2024	text-to-speechText to Speech	—Unverified	0	0
Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices	Jun 11, 2024	EthicsFairness	—Unverified	0	0
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services	Apr 12, 2025	Voice Cloning	—Unverified	0	0
De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks	Jul 3, 2025	Voice Cloning	—Unverified	0	0
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech	Mar 6, 2021	text-to-speechText to Speech	—Unverified	0	0
Data Efficient Voice Cloning for Neural Singing Synthesis	Feb 19, 2019	text-to-speechText to Speech	—Unverified	0	0
Improve few-shot voice cloning using multi-modal learning	Mar 18, 2022	text-to-speechText to Speech	—Unverified	0	0

Show:10 25 50

← PrevPage 1 of 3Next →

No leaderboard results yet.