Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis Jul 4, 2024 Accented Speech Recognition Automatic Speech Recognition
— Unverified 0Probing the Feasibility of Multilingual Speaker Anonymization Jul 3, 2024 Speaker anonymization Speech Synthesis
— Unverified 0Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization Jul 2, 2024 Inference Optimization Speech Synthesis
— Unverified 0A Comprehensive Survey on Diffusion Models and Their Applications Jul 1, 2024 Speech Synthesis Survey
— Unverified 0Lightweight Zero-shot Text-to-Speech with Mixture of Adapters Jul 1, 2024 Decoder Speech Synthesis
— Unverified 0FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis Jun 30, 2024 CPU Decoder
— Unverified 0DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability Jun 27, 2024 Speech Synthesis text-to-speech
Code Code Available 2High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model Jun 25, 2024 Computational Efficiency Language Modeling
— Unverified 0Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment Jun 25, 2024 Decoder Language Modeling
— Unverified 0Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation Jun 25, 2024 Speech Synthesis text-to-speech
— Unverified 0Towards Zero-Shot Text-To-Speech for Arabic Dialects Jun 24, 2024 Dialect Identification Speech Synthesis
— Unverified 0One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection Jun 24, 2024 Audio Deepfake Detection DeepFake Detection
Code Code Available 0A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge Jun 22, 2024 Speech Synthesis text-to-speech
— Unverified 0A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis Jun 18, 2024 Decoder Speech Synthesis
— Unverified 01000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis Jun 17, 2024 Diversity Speech Synthesis
— Unverified 0Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis Jun 16, 2024 Disentanglement Speech Synthesis
— Unverified 0Articulatory Phonetics Informed Controllable Expressive Speech Synthesis Jun 15, 2024 Expressive Speech Synthesis Speech Synthesis
Code Code Available 1ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis Jun 13, 2024 Quantization Speech Synthesis
— Unverified 0PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models Jun 12, 2024 Language Modeling Language Modelling
— Unverified 0VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment Jun 12, 2024 Quantization Speech Synthesis
— Unverified 0CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems Jun 11, 2024 Audio Synthesis Face Swapping
— Unverified 0Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? Jun 11, 2024 Contrastive Learning Speech Synthesis
— Unverified 0JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis Jun 10, 2024 Speech Synthesis
— Unverified 0Meta Learning Text-to-Speech Synthesis in over 7000 Languages Jun 10, 2024 Meta-Learning Speech Synthesis
— Unverified 0Text-aware and Context-aware Expressive Audiobook Speech Synthesis Jun 9, 2024 Contrastive Learning Language Modeling
— Unverified 0VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Jun 8, 2024 Speech Synthesis text-to-speech
— Unverified 0Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Jun 8, 2024 Audio Generation Decoder
— Unverified 0Spectral Codecs: Improving Non-Autoregressive Speech Synthesis with Spectrogram-Based Audio Codecs Jun 7, 2024 Quantization Speech Synthesis
— Unverified 0Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis Jun 6, 2024 Decoder Inductive Bias
Code Code Available 2Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model Jun 6, 2024 Language Modeling Language Modelling
— Unverified 0Style Mixture of Experts for Expressive Text-To-Speech Synthesis Jun 5, 2024 Mixture-of-Experts Speech Synthesis
— Unverified 0StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis Jun 4, 2024 In-Context Learning Language Modeling
— Unverified 0ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control Jun 3, 2024 Speech Synthesis text-to-speech
Code Code Available 3Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training Jun 3, 2024 Speech Synthesis text-to-speech
— Unverified 0Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback Jun 2, 2024 Speech Synthesis text-to-speech
— Unverified 0Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction May 31, 2024 Speech Synthesis
Code Code Available 5Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning May 23, 2024 Speech Synthesis text-to-speech
— Unverified 0DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models May 23, 2024 Image Generation reinforcement-learning
— Unverified 0Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model May 16, 2024 Hallucination Language Modeling
— Unverified 0Expressivity and Speech Synthesis Apr 30, 2024 Expressive Speech Synthesis Speech Synthesis
— Unverified 0UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts Apr 29, 2024 Contrastive Learning Speech Synthesis
Code Code Available 1FlashSpeech: Efficient Zero-Shot Speech Synthesis Apr 23, 2024 Rhythm Speech Synthesis
Code Code Available 3Retrieval-Augmented Audio Deepfake Detection Apr 22, 2024 Audio Deepfake Detection DeepFake Detection
— Unverified 0Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications Apr 21, 2024 Computational Efficiency Model Optimization
— Unverified 0Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness Apr 10, 2024 Speech Synthesis text-to-speech
Code Code Available 2HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks Apr 6, 2024 Domain Adaptation Speech Synthesis
Code Code Available 1RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis Apr 4, 2024 Language Modeling Language Modelling
— Unverified 0Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation Apr 3, 2024 Speech Synthesis
— Unverified 0PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders Apr 3, 2024 Representation Learning Speaker Verification
— Unverified 0