| Development of an Inclusive Educational Platform Using Open Technologies and Machine Learning: A Case Study on Accessibility Enhancement | Jan 22, 2025 | Object Recognitionspeech-recognition | —Unverified | 0 |
| Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model | May 16, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech | Apr 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Explicit Intensity Control for Accented Text-to-speech | Oct 27, 2022 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| A New Approach to Voice Authenticity | Feb 9, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles using Machine Learning | May 29, 2022 | ArticlesMachine Translation | —Unverified | 0 |
| Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems | Sep 18, 2024 | Sentencetext-to-speech | —Unverified | 0 |
| Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation | Apr 8, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Exploring Speech Enhancement for Low-resource Speech Synthesis | Sep 19, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Exploring speech style spaces with language models: Emotional TTS without emotion labels | May 18, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Development and Evaluation of Speech Synthesis Corpora for Latvian | May 1, 2020 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability | Jul 30, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners | Feb 28, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention | Dec 29, 2020 | Data Augmentationtext-to-speech | —Unverified | 0 |
| Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis | May 29, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music | Mar 4, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Designing French Tale Corpora for Entertaining Text To Speech Synthesis | May 1, 2012 | SentenceSpeech Synthesis | —Unverified | 0 |
| Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control | Sep 26, 2024 | Self-Supervised Learningtext-to-speech | —Unverified | 0 |
| Automatic Evaluation of Speaker Similarity | Jul 1, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis | Apr 14, 2021 | Dependency ParsingRepresentation Learning | —Unverified | 0 |
| Denoising Text to Speech with Frame-Level Noise Modeling | Dec 17, 2020 | Denoisingtext-to-speech | —Unverified | 0 |
| Automatic Arabic Dialect Identification Systems for Written Texts: A Survey | Sep 26, 2020 | Dialect IdentificationMachine Translation | —Unverified | 0 |
| Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition | May 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Unified Transformer-based Framework for Duplex Text Normalization | Aug 23, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| An End-to-End Neural Network for Image-to-Audio Transformation | Mar 10, 2023 | Image to texttext-to-speech | —Unverified | 0 |
| Deliberation Model for On-Device Spoken Language Understanding | Apr 4, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis | Nov 11, 2019 | Polyphone disambiguationSpeech Synthesis | —Unverified | 0 |
| Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | Jun 6, 2023 | Neural Renderingtext-to-speech | —Unverified | 0 |
| Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese | May 16, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model | Mar 6, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios | Apr 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction | Dec 11, 2024 | DecoderSelf-Supervised Learning | —Unverified | 0 |
| UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech | May 15, 2025 | Emotional Speech SynthesisLanguage Modeling | —Unverified | 0 |
| A unified front-end framework for English text-to-speech synthesis | May 18, 2023 | Speech SynthesisText Normalization | —Unverified | 0 |
| Deep Text-to-Speech System with Seq2Seq Model | Mar 11, 2019 | modelSpeech Synthesis | —Unverified | 0 |
| An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space | Nov 6, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech | May 8, 2025 | Style Transfertext-to-speech | —Unverified | 0 |
| Deep Shallow Fusion for RNN-T Personalization | Nov 16, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Deep Performer: Score-to-Audio Music Performance Synthesis | Feb 12, 2022 | DecoderSpeech Synthesis | —Unverified | 0 |
| A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages | Oct 18, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Deep Feed-forward Sequential Memory Networks for Speech Synthesis | Feb 26, 2018 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Augmenting text for spoken language understanding with Large Language Models | Sep 17, 2023 | Semantic ParsingSpoken Language Understanding | —Unverified | 0 |
| An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments | Jul 14, 2025 | Speech-to-Texttext-to-speech | —Unverified | 0 |
| Deep Denoising Auto-encoder for Statistical Speech Synthesis | Jun 17, 2015 | DenoisingSpeech Synthesis | —Unverified | 0 |
| DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation | Mar 28, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 |
| Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework | Nov 4, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Debatts: Zero-Shot Debating Text-to-Speech Synthesis | Nov 10, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack | Sep 11, 2024 | Adversarial AttackAudio Synthesis | —Unverified | 0 |
| Augmentation through Laundering Attacks for Audio Spoof Detection | Oct 1, 2024 | Data AugmentationFace Swapping | —Unverified | 0 |
| Data Redaction from Conditional Generative Models | May 18, 2023 | text-to-speechText to Speech | —Unverified | 0 |