EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech Jun 12, 2024 Emotional Speech Synthesis text-to-speech
Code Code Available 2Meta Learning Text-to-Speech Synthesis in over 7000 Languages Jun 10, 2024 Meta-Learning Speech Synthesis
Code Code Available 0VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Jun 8, 2024 Speech Synthesis text-to-speech
— Unverified 0Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Jun 8, 2024 Audio Generation Decoder
— Unverified 0Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model Jun 6, 2024 Language Modeling Language Modelling
— Unverified 0Style Mixture of Experts for Expressive Text-To-Speech Synthesis Jun 5, 2024 Mixture-of-Experts Speech Synthesis
— Unverified 0StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis Jun 4, 2024 In-Context Learning Language Modeling
— Unverified 0Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback Jun 2, 2024 Speech Synthesis text-to-speech
— Unverified 0DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models May 23, 2024 Image Generation reinforcement-learning
— Unverified 0Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model May 16, 2024 Hallucination Language Modeling
— Unverified 0UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts Apr 29, 2024 Contrastive Learning Speech Synthesis
Code Code Available 1RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis Apr 4, 2024 Language Modeling Language Modelling
— Unverified 0PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders Apr 3, 2024 Representation Learning Speaker Verification
— Unverified 0KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis Apr 1, 2024 Speech Synthesis text-to-speech
Code Code Available 1CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Mar 31, 2024 Denoising Speech Synthesis
Code Code Available 2Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting Feb 19, 2024 Language Modeling Language Modelling
Code Code Available 0Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters Jan 10, 2024 Self-Supervised Learning Speech Enhancement
— Unverified 0Boosting Large Language Model for Speech Synthesis: An Empirical Study Dec 30, 2023 Language Modeling Language Modelling
— Unverified 0Normalization of Lithuanian Text Using Regular Expressions Dec 29, 2023 Speech Synthesis Text Normalization
— Unverified 0MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis Dec 17, 2023 Speech Synthesis Style Transfer
— Unverified 0An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis Dec 8, 2023 Benchmarking Quantization
— Unverified 0Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis Dec 6, 2023 Speech Synthesis text-to-speech
— Unverified 0Code-Mixed Text to Speech Synthesis under Low-Resource Constraints Dec 2, 2023 Speech Synthesis text-to-speech
— Unverified 0Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech Nov 24, 2023 Dimensionality Reduction Emotion Classification
Code Code Available 1Guided Flows for Generative Modeling and Decision Making Nov 22, 2023 Conditional Image Generation Decision Making
— Unverified 0Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Nov 7, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Generative Pre-training for Speech with Flow Matching Oct 25, 2023 Speech Enhancement Speech Synthesis
— Unverified 0Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Oct 25, 2023 en-US domain classification en-US Intent Classification
Code Code Available 0ArTST: Arabic Text and Speech Transformer Oct 25, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling Oct 14, 2023 Speech Synthesis text-to-speech
Code Code Available 2Attentive Multi-Layer Perceptron for Non-autoregressive Generation Oct 14, 2023 Machine Translation Speech Synthesis
Code Code Available 0Unified speech and gesture synthesis using flow matching Oct 8, 2023 Audio Synthesis Motion Synthesis
— Unverified 0LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains Oct 4, 2023 Speech Synthesis text-to-speech
— Unverified 0DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis Sep 22, 2023 Denoising Speech Synthesis
— Unverified 0FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 2Matcha-TTS: A fast TTS architecture with conditional flow matching Sep 6, 2023 Acoustic Modelling Decoder
Code Code Available 3The FruitShell French synthesis system at the Blizzard 2023 Challenge Sep 1, 2023 Data Augmentation Speech Synthesis
— Unverified 0QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning Aug 31, 2023 Representation Learning Speech Representation Learning
Code Code Available 1Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis Aug 31, 2023 Expressive Speech Synthesis Sentence
— Unverified 0Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation Aug 3, 2023 Decoder Quantization
Code Code Available 1SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis Aug 2, 2023 Decoder Self-Supervised Learning
— Unverified 0Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech Jul 31, 2023 Acoustic Modelling Speech Synthesis
— Unverified 0SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs Jul 18, 2023 Generative Adversarial Network Language Modeling
— Unverified 0High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units Jun 29, 2023 Speech Synthesis text-to-speech
— Unverified 0Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Jun 23, 2023 In-Context Learning Speech Synthesis
Code Code Available 0Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration May 25, 2023 Speech Synthesis text-to-speech
Code Code Available 1ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models May 23, 2023 Speech Synthesis text-to-speech
— Unverified 0VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages May 21, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0