SOTAVerified|Agents Browse Leaderboard About Blog

Voice Cloning

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 112 papers

Title	Date	Tasks	Status	Hype	Score
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens	Mar 3, 2025	Attributetext-to-speech	CodeCode Available	11	5
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs	Jul 4, 2024	Emotion RecognitionEvent Detection	CodeCode Available	11	5
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System	Feb 8, 2025	DecoderLanguage Modeling	CodeCode Available	11	5
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens	Jul 7, 2024	Language ModellingLarge Language Model	CodeCode Available	11	5
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction	Feb 17, 2025	Instruction FollowingVoice Cloning	CodeCode Available	7	5
OpenVoice: Versatile Instant Voice Cloning	Dec 3, 2023	RhythmVoice Cloning	CodeCode Available	7	5
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech	Nov 7, 2022	Representation LearningSpeech Representation Learning	CodeCode Available	6	5
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert	Apr 18, 2023	Audio GenerationExpressive Speech Synthesis	CodeCode Available	4	5
Proactive Detection of Voice Cloning with Localized Watermarking	Jan 30, 2024	Voice Cloning	CodeCode Available	4	5
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning	Jul 9, 2019	Speech Synthesistext-to-speech	CodeCode Available	3	5

Show:10 25 50

← PrevPage 1 of 12Next →

No leaderboard results yet.