SOTAVerified

Voice Cloning

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Papers

Showing 150 of 112 papers

TitleStatusHype
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemCode11
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsCode11
OpenVoice: Versatile Instant Voice CloningCode7
Step-Audio: Unified Understanding and Generation in Intelligent Speech InteractionCode7
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-SpeechCode6
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained HubertCode4
Proactive Detection of Voice Cloning with Localized WatermarkingCode4
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice CloningCode3
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song GenerationCode3
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion ControlCode2
StyleDubber: Towards Multi-Scale Style Learning for Movie DubbingCode2
Small-E: Small Language Model with Linear Attention for Efficient Speech SynthesisCode2
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency LossCode1
Anonymizing Speech: Evaluating and Designing Speaker Anonymization TechniquesCode1
Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via TextCode1
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelCode1
One Model, Many Languages: Meta-learning for Multilingual Text-to-SpeechCode1
Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned FeaturesCode1
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech SynthesisCode0
Dictionary Attacks on Speaker VerificationCode0
WavLM model ensemble for audio deepfake detectionCode0
Is Audio Spoof Detection Robust to Laundering Attacks?Code0
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-SpeechCode0
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM DevelopmentCode0
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech SynthesisCode0
SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and MachinesCode0
Few-Shot Speech Deepfake Detection Adaptation with Gaussian ProcessesCode0
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake DatasetCode0
Neural Voice Cloning with a Few SamplesCode0
ClonEval: An Open Voice Cloning BenchmarkCode0
Low-Resource Multilingual and Zero-Shot Multispeaker TTSCode0
Discovery of Single Independent Latent VariableCode0
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis0
Can DeepFake Speech be Reliably Detected?0
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing0
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis0
Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection0
Latent linguistic embedding for cross-lingual text-to-speech and voice conversion0
Deepfake Technology Unveiled: The Commoditization of AI and Its Impact on Digital Trust0
Augmentation through Laundering Attacks for Audio Spoof Detection0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset0
Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices0
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services0
De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks0
Data Efficient Voice Cloning for Neural Singing Synthesis0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.