SOTAVerified

Voice Cloning

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Papers

Showing 150 of 112 papers

TitleStatusHype
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsCode11
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemCode11
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
OpenVoice: Versatile Instant Voice CloningCode7
Step-Audio: Unified Understanding and Generation in Intelligent Speech InteractionCode7
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-SpeechCode6
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained HubertCode4
Proactive Detection of Voice Cloning with Localized WatermarkingCode4
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice CloningCode3
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song GenerationCode3
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion ControlCode2
StyleDubber: Towards Multi-Scale Style Learning for Movie DubbingCode2
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
Small-E: Small Language Model with Linear Attention for Efficient Speech SynthesisCode2
Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency LossCode1
Anonymizing Speech: Evaluating and Designing Speaker Anonymization TechniquesCode1
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelCode1
Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via TextCode1
Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned FeaturesCode1
One Model, Many Languages: Meta-learning for Multilingual Text-to-SpeechCode1
Can DeepFake Speech be Reliably Detected?0
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language0
MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing0
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis0
Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection0
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis0
Deepfake Technology Unveiled: The Commoditization of AI and Its Impact on Digital Trust0
Latent linguistic embedding for cross-lingual text-to-speech and voice conversion0
Augmentation through Laundering Attacks for Audio Spoof Detection0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset0
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning0
De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks0
Data Efficient Voice Cloning for Neural Singing Synthesis0
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis0
Improve few-shot voice cloning using multi-modal learning0
CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge0
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data0
Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset0
Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers0
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge0
MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model0
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services0
Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices0
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models0
Collaborative Watermarking for Adversarial Speech Synthesis0
Expressive Neural Voice Cloning0
Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.