SOTAVerified

Voice Cloning

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Papers

Showing 125 of 112 papers

TitleStatusHype
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemCode11
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsCode11
Step-Audio: Unified Understanding and Generation in Intelligent Speech InteractionCode7
OpenVoice: Versatile Instant Voice CloningCode7
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-SpeechCode6
Proactive Detection of Voice Cloning with Localized WatermarkingCode4
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained HubertCode4
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song GenerationCode3
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice CloningCode3
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion ControlCode2
Small-E: Small Language Model with Linear Attention for Efficient Speech SynthesisCode2
StyleDubber: Towards Multi-Scale Style Learning for Movie DubbingCode2
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelCode1
Anonymizing Speech: Evaluating and Designing Speaker Anonymization TechniquesCode1
Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned FeaturesCode1
Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via TextCode1
Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency LossCode1
One Model, Many Languages: Meta-learning for Multilingual Text-to-SpeechCode1
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison0
De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks0
Few-Shot Speech Deepfake Detection Adaptation with Gaussian ProcessesCode0
Show:102550
← PrevPage 1 of 5Next →

No leaderboard results yet.