| DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech | Jun 25, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction | Mar 1, 2023 | Dynamic Time WarpingMetric Learning | —Unverified | 0 |
| An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer | Jan 4, 2018 | Neural Network simulationtext-to-speech | —Unverified | 0 |
| Dual Script E2E framework for Multilingual and Code-Switching ASR | Jun 2, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance | Aug 26, 2024 | Diversitytext-to-speech | —Unverified | 0 |
| BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model | Jul 4, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing | Jun 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech | Feb 27, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Voice Impression Control in Zero-Shot TTS | Jun 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Enhancing Crowdsourced Audio for Text-to-Speech Models | Oct 17, 2024 | Denoisingtext-to-speech | —Unverified | 0 |
| DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis | Sep 22, 2023 | DenoisingSpeech Synthesis | —Unverified | 0 |
| Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection | Dec 2, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Enhancing Speech-to-Speech Translation with Multiple TTS Targets | Apr 10, 2023 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| Ensemble prosody prediction for expressive speech synthesis | Apr 3, 2023 | DiversityEnsemble Learning | —Unverified | 0 |
| ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs | Oct 16, 2024 | DiversityOnline Clustering | —Unverified | 0 |
| A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers | Apr 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Direct Speech to Speech Translation: A Review | Mar 3, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS | Jun 25, 2025 | Speaker Recognitiontext-to-speech | —Unverified | 0 |
| Digital Einstein Experience: Fast Text-to-Speech for Conversational AI | Jul 21, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Effective Decoder Masking for Transformer Based End-to-End Speech Recognition | Oct 27, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DiffVoice: Text-to-Speech with Latent Diffusion | Apr 23, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition | Mar 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization | Feb 4, 2020 | Bayesian Optimizationtext-to-speech | —Unverified | 0 |
| Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment | Oct 28, 2019 | Hard AttentionSpeech Synthesis | —Unverified | 0 |
| Efficient data selection employing Semantic Similarity-based Graph Structures for model training | Feb 22, 2024 | Semantic SimilaritySemantic Textual Similarity | —Unverified | 0 |
| Efficient Generative Modeling with Residual Vector Quantization-Based Tokens | Dec 13, 2024 | Conditional Image GenerationImage Generation | —Unverified | 0 |
| Efficient Incremental Text-to-Speech on GPUs | Nov 25, 2022 | GPUSpeech Synthesis | —Unverified | 0 |
| AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis | Apr 14, 2025 | RAGRetrieval-augmented Generation | —Unverified | 0 |
| Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS | Oct 24, 2022 | Data AugmentationGPU | —Unverified | 0 |
| An Expert System for Automatic Reading of A Text Written in Standard Arabic | May 8, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles | Dec 4, 2024 | Prosody Predictiontext-to-speech | —Unverified | 0 |
| Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch | Oct 9, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams | Oct 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering | Jan 14, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Auto Spell Suggestion for High Quality Speech Synthesis in Hindi | Feb 15, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ADEPT: A Dataset for Evaluating Prosody Transfer | Jun 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| EmoCat: Language-agnostic Emotional Voice Conversion | Jan 14, 2021 | Decodertext-to-speech | —Unverified | 0 |
| End-to-end speech recognition modeling from de-identified data | Jul 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization | Sep 16, 2024 | Emotional Speech SynthesisIn-Context Learning | —Unverified | 0 |
| Autoregressive Speech Synthesis without Vector Quantization | Jul 11, 2024 | Audio CompressionDiversity | —Unverified | 0 |
| BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text | Aug 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis | Dec 8, 2023 | BenchmarkingQuantization | —Unverified | 0 |
| AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling | Mar 21, 2022 | DecoderSpeech Synthesis | —Unverified | 0 |
| Autoregressive Speech Synthesis with Next-Distribution Prediction | Dec 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR | Mar 11, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech | May 26, 2025 | AttributeEmotional Speech Synthesis | —Unverified | 0 |
| Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions | Sep 25, 2024 | AttributeDimensionality Reduction | —Unverified | 0 |
| Autoregressive Diffusion Transformer for Text-to-Speech Synthesis | Jun 8, 2024 | Audio GenerationDecoder | —Unverified | 0 |
| Diacritization of Maghrebi Arabic Sub-Dialects | Oct 15, 2018 | text-to-speechText to Speech | —Unverified | 0 |