DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis Oct 17, 2024 Speech Synthesis text-to-speech
— Unverified 0Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch Oct 9, 2024 Speech Synthesis text-to-speech
— Unverified 0Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS Oct 9, 2024 Diversity Speech Synthesis
— Unverified 0HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis Oct 6, 2024 Language Modeling Language Modelling
— Unverified 0Generative Semantic Communication for Text-to-Speech Synthesis Oct 4, 2024 Quantization Semantic Communication
— Unverified 0Accent conversion using discrete units with parallel data synthesized from controllable accented TTS Sep 30, 2024 Data Augmentation Speech Synthesis
— Unverified 0StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis Sep 24, 2024 Speech Synthesis text-to-speech
— Unverified 0StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Sep 16, 2024 Speech Synthesis text-to-speech
— Unverified 0Text-To-Speech Synthesis In The Wild Sep 13, 2024 Benchmarking Speaker Recognition
— Unverified 0Full-text Error Correction for Chinese Speech Recognition with Large Language Model Sep 12, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0What happens to diffusion model likelihood when your model is conditional? Sep 10, 2024 domain classification model
— Unverified 0AS-Speech: Adaptive Style For Speech Synthesis Sep 9, 2024 Rhythm Speech Synthesis
— Unverified 0Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks Jul 26, 2024 Generative Adversarial Network Speech Enhancement
— Unverified 0Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models Jul 18, 2024 Language Modeling Language Modelling
— Unverified 0Autoregressive Speech Synthesis without Vector Quantization Jul 11, 2024 Audio Compression Diversity
— Unverified 0Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis Jul 4, 2024 Accented Speech Recognition Automatic Speech Recognition
— Unverified 0Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization Jul 2, 2024 Inference Optimization Speech Synthesis
— Unverified 0FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis Jun 30, 2024 CPU Decoder
— Unverified 0Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis Jun 16, 2024 Disentanglement Speech Synthesis
— Unverified 0VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment Jun 12, 2024 Quantization Speech Synthesis
— Unverified 0Meta Learning Text-to-Speech Synthesis in over 7000 Languages Jun 10, 2024 Meta-Learning Speech Synthesis
Code Code Available 0Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Jun 8, 2024 Audio Generation Decoder
— Unverified 0VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Jun 8, 2024 Speech Synthesis text-to-speech
— Unverified 0Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model Jun 6, 2024 Language Modeling Language Modelling
— Unverified 0Style Mixture of Experts for Expressive Text-To-Speech Synthesis Jun 5, 2024 Mixture-of-Experts Speech Synthesis
— Unverified 0Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis Jun 4, 2024 In-Context Learning Language Modeling
— Unverified 0Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback Jun 2, 2024 Speech Synthesis text-to-speech
— Unverified 0DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models May 23, 2024 Image Generation reinforcement-learning
— Unverified 0Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model May 16, 2024 Hallucination Language Modeling
— Unverified 0RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis Apr 4, 2024 Language Modeling Language Modelling
— Unverified 0PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders Apr 3, 2024 Representation Learning Speaker Verification
— Unverified 0Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting Feb 19, 2024 Language Modeling Language Modelling
Code Code Available 0Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters Jan 10, 2024 Self-Supervised Learning Speech Enhancement
— Unverified 0Boosting Large Language Model for Speech Synthesis: An Empirical Study Dec 30, 2023 Language Modeling Language Modelling
— Unverified 0Normalization of Lithuanian Text Using Regular Expressions Dec 29, 2023 Speech Synthesis Text Normalization
— Unverified 0MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis Dec 17, 2023 Speech Synthesis Style Transfer
— Unverified 0An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis Dec 8, 2023 Benchmarking Quantization
— Unverified 0Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis Dec 6, 2023 Speech Synthesis text-to-speech
— Unverified 0Code-Mixed Text to Speech Synthesis under Low-Resource Constraints Dec 2, 2023 Speech Synthesis text-to-speech
— Unverified 0Guided Flows for Generative Modeling and Decision Making Nov 22, 2023 Conditional Image Generation Decision Making
— Unverified 0Generative Pre-training for Speech with Flow Matching Oct 25, 2023 Speech Enhancement Speech Synthesis
— Unverified 0Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Oct 25, 2023 en-US domain classification en-US Intent Classification
Code Code Available 0Attentive Multi-Layer Perceptron for Non-autoregressive Generation Oct 14, 2023 Machine Translation Speech Synthesis
Code Code Available 0Unified speech and gesture synthesis using flow matching Oct 8, 2023 Audio Synthesis Motion Synthesis
— Unverified 0The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains Oct 4, 2023 Speech Synthesis text-to-speech
— Unverified 0DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis Sep 22, 2023 Denoising Speech Synthesis
— Unverified 0The FruitShell French synthesis system at the Blizzard 2023 Challenge Sep 1, 2023 Data Augmentation Speech Synthesis
— Unverified 0Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis Aug 31, 2023 Expressive Speech Synthesis Sentence
— Unverified 0SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis Aug 2, 2023 Decoder Self-Supervised Learning
— Unverified 0Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech Jul 31, 2023 Acoustic Modelling Speech Synthesis
— Unverified 0