| AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling | Mar 21, 2022 | DecoderSpeech Synthesis | —Unverified | 0 |
| Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition | Oct 26, 2020 | Emotion RecognitionSpeech Emotion Recognition | —Unverified | 0 |
| EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model | Jun 17, 2021 | Emotional Speech SynthesisEmotion Classification | —Unverified | 0 |
| EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting | Apr 17, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems | Jan 16, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems | Jul 1, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Emphasis control for parallel neural TTS | Oct 6, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example – Tools, Methods and Experiments | Jun 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Emphasized Accent Phrase Prediction from Text for Advertisement Text-To-Speech Synthesis | Dec 1, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition | Feb 20, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Building Text-to-Speech Systems for Resource Poor Languages | May 1, 2012 | ClusteringSpeech Synthesis | —Unverified | 0 |
| Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis | Apr 10, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech | Mar 13, 2024 | GPUSpeech Synthesis | —Unverified | 0 |
| Autoregressive Speech Synthesis with Next-Distribution Prediction | Dec 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis | Dec 8, 2023 | BenchmarkingQuantization | —Unverified | 0 |
| End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator | Oct 31, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2 | Jan 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech | May 26, 2025 | AttributeEmotional Speech Synthesis | —Unverified | 0 |
| Autoregressive Diffusion Transformer for Text-to-Speech Synthesis | Jun 8, 2024 | Audio GenerationDecoder | —Unverified | 0 |
| End-to-end speech recognition modeling from de-identified data | Jul 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue | Jun 24, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning | Apr 13, 2019 | Cross-Lingual Transfertext-to-speech | —Unverified | 0 |
| End-to-End Text-to-Speech using Latent Duration based on VQ-VAE | Oct 19, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation | Apr 6, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch | Apr 12, 2022 | Sentencetext-to-speech | —Unverified | 0 |
| Enhancing audio quality for expressive Neural Text-to-Speech | Aug 13, 2021 | Acoustic ModellingSpeech Synthesis | —Unverified | 0 |
| Enhancing Crowdsourced Audio for Text-to-Speech Models | Oct 17, 2024 | Denoisingtext-to-speech | —Unverified | 0 |
| Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap | Oct 22, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Diacritization of Maghrebi Arabic Sub-Dialects | Oct 15, 2018 | text-to-speechText to Speech | —Unverified | 0 |
| AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech | Nov 28, 2016 | text-to-speechText to Speech | —Unverified | 0 |
| Enhancing Speech-to-Speech Translation with Multiple TTS Targets | Apr 10, 2023 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR | Mar 11, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Expressive Neural Voice Cloning | Jan 30, 2021 | Speech SynthesisStyle Transfer | —Unverified | 0 |
| Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations | Feb 5, 2024 | DecoderIn-Context Learning | —Unverified | 0 |
| Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback | Jun 2, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Ensemble prosody prediction for expressive speech synthesis | Apr 3, 2023 | DiversityEnsemble Learning | —Unverified | 0 |
| Environment Aware Text-to-Speech Synthesis | Oct 8, 2021 | AttributeDisentanglement | —Unverified | 0 |
| EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models | Sep 22, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech | Oct 29, 2020 | Decodertext-to-speech | —Unverified | 0 |
| Development of Smartcall Vietnamese Text-to-Speech for VLSP 2020 | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs | Oct 16, 2024 | DiversityOnline Clustering | —Unverified | 0 |
| ESPnet2-TTS: Extending the Edge of TTS Research | Oct 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Automatic Speech Recognition for Hindi | Jun 26, 2024 | Action DetectionActivity Detection | —Unverified | 0 |
| ESPnet-ST: All-in-One Speech Translation Toolkit | Apr 21, 2020 | AllAutomatic Speech Recognition | —Unverified | 0 |
| Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment | Nov 7, 2023 | DecoderPosition | —Unverified | 0 |
| Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts | Oct 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical Embodiments | Jan 7, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Evaluating and reducing the distance between synthetic and real speech distributions | Nov 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs | Sep 9, 2019 | FormSpeech Synthesis | —Unverified | 0 |
| Development of Marathi Part of Speech Tagger Using Statistical Approach | Oct 2, 2013 | Information RetrievalPart-Of-Speech Tagging | —Unverified | 0 |