SOTAVerified

Voice Cloning

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Papers

Showing 150 of 112 papers

TitleStatusHype
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison0
De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks0
Few-Shot Speech Deepfake Detection Adaptation with Gaussian ProcessesCode0
Voice Adaptation for Swiss German0
Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages0
VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents0
CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning0
Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection0
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling0
VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning0
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder0
Voice Cloning: Comprehensive Survey0
ClonEval: An Open Voice Cloning BenchmarkCode0
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services0
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis0
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM DevelopmentCode0
SoK: How Robust is Audio Watermarking in Generative AI models?0
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology0
Steganography Beyond Space-Time with Chain of Multimodal AI0
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song GenerationCode3
Step-Audio: Unified Understanding and Generation in Intelligent Speech InteractionCode7
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemCode11
Deepfake Technology Unveiled: The Commoditization of AI and Its Impact on Digital Trust0
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement0
MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset0
Speech Watermarking with Discrete Intermediate Representations0
Parallel Stacked Aggregated Network for Voice Authentication in IoT-Enabled Smart Devices0
Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset0
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings0
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis0
Can DeepFake Speech be Reliably Detected?0
Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems0
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion ControlCode2
Augmentation through Laundering Attacks for Audio Spoof Detection0
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space0
Multi-modal Adversarial Training for Zero-Shot Voice Cloning0
Is Audio Spoof Detection Robust to Laundering Attacks?Code0
kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech0
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language0
WavLM model ensemble for audio deepfake detectionCode0
Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems0
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsCode11
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing0
Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.