| DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech | Oct 17, 2024 | DisentanglementQuantization | —Unverified | 0 | 0 |
| DASB -- Discrete Audio and Speech Benchmark | Jun 20, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 | 0 |
| Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios | Jun 7, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys | Nov 18, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech | Jan 19, 2024 | Self-Supervised Learningtext-to-speech | —Unverified | 0 | 0 |
| Data Efficient Voice Cloning for Neural Singing Synthesis | Feb 19, 2019 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System | Apr 20, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Data Redaction from Conditional Generative Models | May 18, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack | Sep 11, 2024 | Adversarial AttackAudio Synthesis | —Unverified | 0 | 0 |
| Debatts: Zero-Shot Debating Text-to-Speech Synthesis | Nov 10, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation | Mar 28, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 | 0 |
| Deep Denoising Auto-encoder for Statistical Speech Synthesis | Jun 17, 2015 | DenoisingSpeech Synthesis | —Unverified | 0 | 0 |
| Deep Feed-forward Sequential Memory Networks for Speech Synthesis | Feb 26, 2018 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| Deep Performer: Score-to-Audio Music Performance Synthesis | Feb 12, 2022 | DecoderSpeech Synthesis | —Unverified | 0 | 0 |
| Deep Shallow Fusion for RNN-T Personalization | Nov 16, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Deep Text-to-Speech System with Seq2Seq Model | Mar 11, 2019 | modelSpeech Synthesis | —Unverified | 0 | 0 |
| Deliberation Model for On-Device Spoken Language Understanding | Apr 4, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition | May 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Denoising Text to Speech with Frame-Level Noise Modeling | Dec 17, 2020 | Denoisingtext-to-speech | —Unverified | 0 | 0 |
| Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis | Apr 14, 2021 | Dependency ParsingRepresentation Learning | —Unverified | 0 | 0 |
| Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control | Sep 26, 2024 | Self-Supervised Learningtext-to-speech | —Unverified | 0 | 0 |
| Designing French Tale Corpora for Entertaining Text To Speech Synthesis | May 1, 2012 | SentenceSpeech Synthesis | —Unverified | 0 | 0 |
| Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention | Dec 29, 2020 | Data Augmentationtext-to-speech | —Unverified | 0 | 0 |
| Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability | Jul 30, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Development and Evaluation of Speech Synthesis Corpora for Latvian | May 1, 2020 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| Development of an Inclusive Educational Platform Using Open Technologies and Machine Learning: A Case Study on Accessibility Enhancement | Jan 22, 2025 | Object Recognitionspeech-recognition | —Unverified | 0 | 0 |
| Development of Marathi Part of Speech Tagger Using Statistical Approach | Oct 2, 2013 | Information RetrievalPart-Of-Speech Tagging | —Unverified | 0 | 0 |
| Development of Smartcall Vietnamese Text-to-Speech for VLSP 2020 | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech | Oct 29, 2020 | Decodertext-to-speech | —Unverified | 0 | 0 |
| Diacritization of Maghrebi Arabic Sub-Dialects | Oct 15, 2018 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech | May 26, 2025 | AttributeEmotional Speech Synthesis | —Unverified | 0 | 0 |
| AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling | Mar 21, 2022 | DecoderSpeech Synthesis | —Unverified | 0 | 0 |
| DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs | Jan 28, 2022 | DenoisingSpeech Synthesis | —Unverified | 0 | 0 |
| DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles | Dec 4, 2024 | Prosody Predictiontext-to-speech | —Unverified | 0 | 0 |
| Diff-TTS: A Denoising Diffusion Model for Text-to-Speech | Apr 3, 2021 | DenoisingGPU | —Unverified | 0 | 0 |
| DiffVoice: Text-to-Speech with Latent Diffusion | Apr 23, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Digital Einstein Experience: Fast Text-to-Speech for Conversational AI | Jul 21, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Direct Speech to Speech Translation: A Review | Mar 3, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Direct Text to Speech Translation System using Acoustic Units | Sep 14, 2023 | DecoderSpeech-to-Speech Translation | —Unverified | 0 | 0 |
| Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT | Jan 2, 2025 | Polyphone disambiguationSentence | —Unverified | 0 | 0 |
| Discovering the Italian literature: interactive access to audio indexed text resources | May 1, 2014 | Cultural Vocal Bursts Intensity PredictionSentence | —Unverified | 0 | 0 |
| DiscreTalk: Text-to-Speech as a Machine Translation Problem | May 12, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech | Oct 24, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing | Jun 4, 2024 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization | Oct 30, 2018 | Data AugmentationDisentanglement | —Unverified | 0 | 0 |
| DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction | May 26, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage | Jun 13, 2024 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| Distribution augmentation for low-resource expressive text-to-speech | Feb 13, 2022 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis | Oct 14, 2024 | DenoisingSpeaker Verification | —Unverified | 0 | 0 |
| DNN-based Speech Synthesis for Indian Languages from ASCII text | Aug 18, 2016 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |