| DNN-based Speech Synthesis for Indian Languages from ASCII text | Aug 18, 2016 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis | Oct 14, 2024 | DenoisingSpeaker Verification | —Unverified | 0 | 0 |
| BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data | Feb 12, 2024 | DecoderDisentanglement | —Unverified | 0 | 0 |
| Distribution augmentation for low-resource expressive text-to-speech | Feb 13, 2022 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage | Jun 13, 2024 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS | Jun 9, 2024 | DenoisingSpeech Denoising | —Unverified | 0 | 0 |
| Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters | Jun 19, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction | May 26, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization | Oct 30, 2018 | Data AugmentationDisentanglement | —Unverified | 0 | 0 |
| Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM | Feb 24, 2025 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 | 0 |
| Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing | Jun 4, 2024 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech | Oct 24, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS | Oct 9, 2024 | DiversitySpeech Synthesis | —Unverified | 0 | 0 |
| An In-depth Analysis of the Effect of Text Normalization in Social Media | May 1, 2015 | Dependency Parsingnamed-entity-recognition | —Unverified | 0 | 0 |
| DiscreTalk: Text-to-Speech as a Machine Translation Problem | May 12, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Discovering the Italian literature: interactive access to audio indexed text resources | May 1, 2014 | Cultural Vocal Bursts Intensity PredictionSentence | —Unverified | 0 | 0 |
| Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT | Jan 2, 2025 | Polyphone disambiguationSentence | —Unverified | 0 | 0 |
| Direct Text to Speech Translation System using Acoustic Units | Sep 14, 2023 | DecoderSpeech-to-Speech Translation | —Unverified | 0 | 0 |
| Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation | Nov 17, 2022 | Data AugmentationMachine Translation | —Unverified | 0 | 0 |
| An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer | Jan 4, 2018 | Neural Network simulationtext-to-speech | —Unverified | 0 | 0 |
| A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data | Jan 21, 2025 | Domain Adaptationspeech-recognition | —Unverified | 0 | 0 |
| A Challenge Set and Methods for Noun-Verb Ambiguity | Oct 1, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Voice Impression Control in Zero-Shot TTS | Jun 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition | Jul 31, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Direct Speech to Speech Translation: A Review | Mar 3, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers | Apr 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Digital Einstein Experience: Fast Text-to-Speech for Conversational AI | Jul 21, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| DiffVoice: Text-to-Speech with Latent Diffusion | Apr 23, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS | Jun 25, 2025 | Speaker Recognitiontext-to-speech | —Unverified | 0 | 0 |
| Diff-TTS: A Denoising Diffusion Model for Text-to-Speech | Apr 3, 2021 | DenoisingGPU | —Unverified | 0 | 0 |
| AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis | Apr 14, 2025 | RAGRetrieval-augmented Generation | —Unverified | 0 | 0 |
| DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles | Dec 4, 2024 | Prosody Predictiontext-to-speech | —Unverified | 0 | 0 |
| Auto Spell Suggestion for High Quality Speech Synthesis in Hindi | Feb 15, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| An Expert System for Automatic Reading of A Text Written in Standard Arabic | May 8, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| ADEPT: A Dataset for Evaluating Prosody Transfer | Jun 15, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs | Jan 28, 2022 | DenoisingSpeech Synthesis | —Unverified | 0 | 0 |
| Autoregressive Speech Synthesis without Vector Quantization | Jul 11, 2024 | Audio CompressionDiversity | —Unverified | 0 | 0 |
| AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling | Mar 21, 2022 | DecoderSpeech Synthesis | —Unverified | 0 | 0 |
| Autoregressive Speech Synthesis with Next-Distribution Prediction | Dec 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis | Dec 8, 2023 | BenchmarkingQuantization | —Unverified | 0 | 0 |
| DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech | May 26, 2025 | AttributeEmotional Speech Synthesis | —Unverified | 0 | 0 |
| Autoregressive Diffusion Transformer for Text-to-Speech Synthesis | Jun 8, 2024 | Audio GenerationDecoder | —Unverified | 0 | 0 |
| Diacritization of Maghrebi Arabic Sub-Dialects | Oct 15, 2018 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech | Nov 28, 2016 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR | Mar 11, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| A Deep Generative Acoustic Model for Compositional Automatic Speech Recognition | Oct 23, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech | Oct 29, 2020 | Decodertext-to-speech | —Unverified | 0 | 0 |
| Development of Smartcall Vietnamese Text-to-Speech for VLSP 2020 | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Automatic Speech Recognition for Hindi | Jun 26, 2024 | Action DetectionActivity Detection | —Unverified | 0 | 0 |
| Development of Marathi Part of Speech Tagger Using Statistical Approach | Oct 2, 2013 | Information RetrievalPart-Of-Speech Tagging | —Unverified | 0 | 0 |