FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 11Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Mar 3, 2025 Attribute text-to-speech
Code Code Available 11IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Feb 8, 2025 Decoder Language Modeling
Code Code Available 11CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 11OpenVoice: Versatile Instant Voice Cloning Dec 3, 2023 Rhythm Voice Cloning
Code Code Available 7Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction Feb 17, 2025 Instruction Following Voice Cloning
Code Code Available 7ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Nov 7, 2022 Representation Learning Speech Representation Learning
Code Code Available 6Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 4Proactive Detection of Voice Cloning with Localized Watermarking Jan 30, 2024 Voice Cloning
Code Code Available 4Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Jul 9, 2019 Speech Synthesis text-to-speech
Code Code Available 3SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Feb 18, 2025 Voice Cloning
Code Code Available 3EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control Oct 1, 2024 Emotional Speech Synthesis Speech Synthesis
Code Code Available 2StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing Feb 20, 2024 Voice Cloning
Code Code Available 2Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 2Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis Jun 6, 2024 Decoder Inductive Bias
Code Code Available 2Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss Apr 22, 2021 Voice Cloning Voice Conversion
Code Code Available 1Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques Aug 5, 2023 Quantization Speaker anonymization
Code Code Available 1LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation Sep 23, 2024 Language Modeling Language Modelling
Code Code Available 1XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Jun 7, 2024 text-to-speech Text to Speech
Code Code Available 1Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text Jun 26, 2021 Talking Face Generation Talking Head Generation
Code Code Available 1Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features Jul 15, 2023 Voice Cloning
Code Code Available 1One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Aug 3, 2020 Meta-Learning Speech Synthesis
Code Code Available 1Can DeepFake Speech be Reliably Detected? Oct 9, 2024 Face Swapping Misinformation
— Unverified 0Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language Aug 19, 2024 Transfer Learning Voice Cloning
— Unverified 0MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI Nov 20, 2023 Chatbot Prompt Engineering
— Unverified 0DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing Jun 13, 2024 Language Modeling Language Modelling
— Unverified 0DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis Oct 14, 2024 Denoising Speaker Verification
— Unverified 0Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection May 22, 2025 DeepFake Detection Face Swapping
— Unverified 0Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis Apr 10, 2025 Speech Synthesis text-to-speech
— Unverified 0Deepfake Technology Unveiled: The Commoditization of AI and Its Impact on Digital Trust Jan 24, 2025 Face Swapping Misinformation
— Unverified 0Latent linguistic embedding for cross-lingual text-to-speech and voice conversion Oct 8, 2020 text-to-speech Text to Speech
— Unverified 0Augmentation through Laundering Attacks for Audio Spoof Detection Oct 1, 2024 Data Augmentation Face Swapping
— Unverified 0Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset Dec 25, 2024 text-to-speech Text to Speech
— Unverified 0Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning Nov 14, 2021 Disentanglement Meta-Learning
— Unverified 0De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks Jul 3, 2025 Voice Cloning
— Unverified 0Data Efficient Voice Cloning for Neural Singing Synthesis Feb 19, 2019 text-to-speech Text to Speech
— Unverified 0Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis Jan 22, 2024 Speaker Verification Speech Synthesis
— Unverified 0Improve few-shot voice cloning using multi-modal learning Mar 18, 2022 text-to-speech Text to Speech
— Unverified 0CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge Mar 8, 2021 Voice Cloning
— Unverified 0Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data Oct 14, 2021 text-to-speech Text to Speech
— Unverified 0Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset Nov 23, 2024 DeepFake Detection Face Swapping
— Unverified 0Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers Nov 26, 2019 Speech Synthesis text-to-speech
— Unverified 0A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge Jun 22, 2024 Speech Synthesis text-to-speech
— Unverified 0MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model Jan 10, 2025 Decoder Language Modelling
— Unverified 0"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services Apr 12, 2025 Voice Cloning
— Unverified 0Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices Jun 11, 2024 Ethics Fairness
— Unverified 0High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models Sep 27, 2023 All Speech Synthesis
— Unverified 0Collaborative Watermarking for Adversarial Speech Synthesis Sep 26, 2023 Speaker Verification Speech Synthesis
— Unverified 0Expressive Neural Voice Cloning Jan 30, 2021 Speech Synthesis Style Transfer
— Unverified 0Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems Oct 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0