Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus Dec 20, 2021 Audio Generation Singing Voice Synthesis
Code Code Available 1YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Dec 4, 2021 Speech Synthesis Text-To-Speech Synthesis
Code Code Available 1Fine-grained style control in Transformer-based Text-to-speech Synthesis Oct 12, 2021 Inductive Bias Speech Synthesis
Code Code Available 1EdiTTS: Score-based Editing for Controllable Text-to-Speech Oct 6, 2021 Speech Synthesis Speech-to-Text
Code Code Available 1WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Jun 17, 2021 Speech Synthesis text-to-speech
Code Code Available 1RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis Jun 15, 2021 speech-recognition Speech Recognition
Code Code Available 1Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling Jun 11, 2021 Speech Synthesis text-to-speech
Code Code Available 1RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis Jun 2, 2021 Diversity Rhythm
Code Code Available 1Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech May 13, 2021 Decoder Speech Synthesis
Code Code Available 1KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset Apr 17, 2021 Speech Synthesis text-to-speech
Code Code Available 1Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities Dec 1, 2020 Chinese Word Segmentation Speech Synthesis
Code Code Available 1Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Nov 6, 2020 Decoder Speech Synthesis
Code Code Available 1Semi-supervised URL Segmentation with Recurrent Neural NetworksPre-trained on Knowledge Graph Entities Nov 5, 2020 Chinese Word Segmentation Speech Synthesis
Code Code Available 1Effective Deep Learning Models for Automatic Diacritization of Arabic Text Nov 1, 2020 Arabic Text Diacritization Decoder
Code Code Available 1WaveGrad: Estimating Gradients for Waveform Generation Sep 2, 2020 Speech Synthesis Text-To-Speech Synthesis
Code Code Available 1Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion Aug 13, 2020 Speech Synthesis text-to-speech
Code Code Available 1FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Jun 8, 2020 Knowledge Distillation Speech Synthesis
Code Code Available 1End-to-End Adversarial Text-to-Speech Jun 5, 2020 Adversarial Text Dynamic Time Warping
Code Code Available 1Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search May 22, 2020 text-to-speech Text to Speech
Code Code Available 1Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis May 12, 2020 Speech Synthesis Style Transfer
Code Code Available 1In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data Apr 4, 2019 Speech Synthesis text-to-speech
Code Code Available 1Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis Mar 27, 2019 Emotional Speech Synthesis Expressive Speech Synthesis
Code Code Available 1Exploring Transfer Learning for Low Resource Emotional TTS Jan 14, 2019 Deep Learning Emotional Speech Synthesis
Code Code Available 1Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Mar 23, 2018 Speech Synthesis Style Transfer
Code Code Available 1Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention Oct 24, 2017 text-to-speech Text to Speech
Code Code Available 1Tacotron: Towards End-to-End Speech Synthesis Mar 29, 2017 Audio Synthesis Speech Synthesis
Code Code Available 1S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation Jun 11, 2025 Reading Comprehension Speech Synthesis
— Unverified 0A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions Jun 4, 2025 Data Augmentation Diversity
— Unverified 0CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech Jun 3, 2025 Speech Synthesis text-to-speech
— Unverified 0SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction Jun 2, 2025 Speech Synthesis text-to-speech
— Unverified 0Chain-of-Thought Training for Open E2E Spoken Dialogue Systems May 31, 2025 Language Modeling Language Modelling
— Unverified 0Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling May 26, 2025 Sentence Speech Synthesis
— Unverified 0Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis May 25, 2025 Speech Synthesis text-to-speech
— Unverified 0FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation May 20, 2025 Dataset Generation Speech Synthesis
— Unverified 0Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis May 18, 2025 Speech Synthesis text-to-speech
— Unverified 0Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications May 12, 2025 Speech Synthesis text-to-speech
— Unverified 0A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models Apr 22, 2025 cross-modal alignment Script Generation
— Unverified 0AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis Apr 14, 2025 RAG Retrieval-augmented Generation
— Unverified 0Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis Apr 14, 2025 Language Modeling Language Modelling
— Unverified 0ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech Feb 13, 2025 Adversarial Attack Adversarial Attack Detection
— Unverified 0Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron Jan 10, 2025 Speech Synthesis text-to-speech
— Unverified 0PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control Jan 10, 2025 Speech Synthesis text-to-speech
— Unverified 0Probing Speaker-specific Features in Speaker Representations Jan 9, 2025 Self-Supervised Learning Speaker Verification
— Unverified 0Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting Dec 28, 2024 Speech Synthesis text-to-speech
— Unverified 0Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis Dec 22, 2024 Decoder Disentanglement
— Unverified 0ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis Dec 16, 2024 Speech Synthesis text-to-speech
— Unverified 0Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Dec 13, 2024 Conditional Image Generation Image Generation
— Unverified 0Multimodal Latent Language Modeling with Next-Token Diffusion Dec 11, 2024 Image Generation Language Modeling
Code Code Available 0Debatts: Zero-Shot Debating Text-to-Speech Synthesis Nov 10, 2024 Speech Synthesis text-to-speech
— Unverified 0A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages Oct 18, 2024 Speech Synthesis text-to-speech
— Unverified 0