SOTAVerified

Voice Cloning

Voice cloning is a highly desired feature for personalized speech interfaces. Neural voice cloning system learns to synthesize a person’s voice from only a few audio samples.

Papers

Showing 2650 of 112 papers

TitleStatusHype
MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset0
Speech Watermarking with Discrete Intermediate Representations0
Parallel Stacked Aggregated Network for Voice Authentication in IoT-Enabled Smart Devices0
Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset0
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings0
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis0
Can DeepFake Speech be Reliably Detected?0
Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems0
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion ControlCode2
Augmentation through Laundering Attacks for Audio Spoof Detection0
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space0
Multi-modal Adversarial Training for Zero-Shot Voice Cloning0
Is Audio Spoof Detection Robust to Laundering Attacks?Code0
kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech0
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language0
WavLM model ensemble for audio deepfake detectionCode0
Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems0
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsCode11
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing0
Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech0
Show:102550
← PrevPage 2 of 5Next →

No leaderboard results yet.