| Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models | Nov 17, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition | Oct 26, 2020 | Emotion RecognitionSpeech Emotion Recognition | —Unverified | 0 | 0 |
| EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model | Jun 17, 2021 | Emotional Speech SynthesisEmotion Classification | —Unverified | 0 | 0 |
| EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting | Apr 17, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems | Jan 16, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems | Jul 1, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Emphasis control for parallel neural TTS | Oct 6, 2021 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting | Aug 20, 2024 | Keyword Spottingtext-to-speech | —Unverified | 0 | 0 |
| Emphasized Accent Phrase Prediction from Text for Advertisement Text-To-Speech Synthesis | Dec 1, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition | Feb 20, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| EmoCat: Language-agnostic Emotional Voice Conversion | Jan 14, 2021 | Decodertext-to-speech | —Unverified | 0 | 0 |
| Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis | Apr 10, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation | May 10, 2025 | Grapheme-to-Phoneme ConversionLarge Language Model | —Unverified | 0 | 0 |
| BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person | Aug 9, 2021 | Talking Head Generationtext-to-speech | —Unverified | 0 | 0 |
| End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator | Oct 31, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering | Jan 14, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 | 0 |
| ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams | Oct 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 | 0 |
| End-to-end speech recognition modeling from de-identified data | Jul 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue | Jun 24, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning | Apr 13, 2019 | Cross-Lingual Transfertext-to-speech | —Unverified | 0 | 0 |
| End-to-End Text-to-Speech using Latent Duration based on VQ-VAE | Oct 19, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation | Apr 6, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch | Apr 12, 2022 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| Enhancing audio quality for expressive Neural Text-to-Speech | Aug 13, 2021 | Acoustic ModellingSpeech Synthesis | —Unverified | 0 | 0 |
| Enhancing Crowdsourced Audio for Text-to-Speech Models | Oct 17, 2024 | Denoisingtext-to-speech | —Unverified | 0 | 0 |
| Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap | Oct 22, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch | Oct 9, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech | Sep 14, 2019 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Enhancing Speech-to-Speech Translation with Multiple TTS Targets | Apr 10, 2023 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 | 0 |
| Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck | Apr 4, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 | 0 |
| Adversarial speech for voice privacy protection from Personalized Speech generation | Jan 22, 2024 | Speaker Verificationtext-to-speech | —Unverified | 0 | 0 |
| Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations | Feb 5, 2024 | DecoderIn-Context Learning | —Unverified | 0 | 0 |
| Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback | Jun 2, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Ensemble prosody prediction for expressive speech synthesis | Apr 3, 2023 | DiversityEnsemble Learning | —Unverified | 0 | 0 |
| Environment Aware Text-to-Speech Synthesis | Oct 8, 2021 | AttributeDisentanglement | —Unverified | 0 | 0 |
| EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models | Sep 22, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| A Comparative Analysis of Pretrained Language Models for Text-to-Speech | Sep 4, 2023 | Natural Language UnderstandingPrediction | —Unverified | 0 | 0 |
| Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS | Oct 24, 2022 | Data AugmentationGPU | —Unverified | 0 | 0 |
| Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio | Nov 25, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| ESPnet2-TTS: Extending the Edge of TTS Research | Oct 15, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Efficient Incremental Text-to-Speech on GPUs | Nov 25, 2022 | GPUSpeech Synthesis | —Unverified | 0 | 0 |
| ESPnet-ST: All-in-One Speech Translation Toolkit | Apr 21, 2020 | AllAutomatic Speech Recognition | —Unverified | 0 | 0 |
| Boosting Large Language Model for Speech Synthesis: An Empirical Study | Dec 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts | Oct 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical Embodiments | Jan 7, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Evaluating and reducing the distance between synthetic and real speech distributions | Nov 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs | Sep 9, 2019 | FormSpeech Synthesis | —Unverified | 0 | 0 |
| An overview of text-to-speech systems and media applications | Oct 22, 2023 | Acoustic Modellingtext-to-speech | —Unverified | 0 | 0 |