| An overview of text-to-speech systems and media applications | Oct 22, 2023 | Acoustic Modellingtext-to-speech | —Unverified | 0 | 0 |
| Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model | May 16, 2024 | HallucinationLanguage Modeling | —Unverified | 0 | 0 |
| Efficient Generative Modeling with Residual Vector Quantization-Based Tokens | Dec 13, 2024 | Conditional Image GenerationImage Generation | —Unverified | 0 | 0 |
| Explicit Intensity Control for Accented Text-to-speech | Oct 27, 2022 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| Efficient data selection employing Semantic Similarity-based Graph Structures for model training | Feb 22, 2024 | Semantic SimilaritySemantic Textual Similarity | —Unverified | 0 | 0 |
| Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles using Machine Learning | May 29, 2022 | ArticlesMachine Translation | —Unverified | 0 | 0 |
| Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems | Sep 18, 2024 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation | Apr 8, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Exploring Speech Enhancement for Low-resource Speech Synthesis | Sep 19, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Exploring speech style spaces with language models: Emotional TTS without emotion labels | May 18, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study | Jun 7, 2024 | DiversityLanguage Modeling | —Unverified | 0 | 0 |
| Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment | Oct 28, 2019 | Hard AttentionSpeech Synthesis | —Unverified | 0 | 0 |
| BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization | Feb 4, 2020 | Bayesian Optimizationtext-to-speech | —Unverified | 0 | 0 |
| An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era | Oct 6, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech | Oct 12, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition | Mar 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation | Jun 4, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Effective Decoder Masking for Transformer Based End-to-End Speech Recognition | Oct 27, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing | Jun 4, 2025 | Quantizationtext-to-speech | —Unverified | 0 | 0 |
| A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions | Jun 4, 2025 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| Easy, Interpretable, Effective: openSMILE for voice deepfake detection | Aug 28, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 | 0 |
| E3 TTS: Easy End-to-End Diffusion-based Text to Speech | Nov 2, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation | Jun 10, 2022 | Machine Translationtext-to-speech | —Unverified | 0 | 0 |
| Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System | Oct 5, 2024 | Adversarial PurificationSpeech Synthesis | —Unverified | 0 | 0 |
| Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs | Jun 12, 2025 | Speech-to-Speech Translationtext-to-speech | —Unverified | 0 | 0 |
| E1 TTS: Simple and Fast Non-Autoregressive TTS | Sep 14, 2024 | Denoisingtext-to-speech | —Unverified | 0 | 0 |
| Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection | Dec 2, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis | Sep 22, 2023 | DenoisingSpeech Synthesis | —Unverified | 0 | 0 |
| Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling | Sep 24, 2024 | Articlestext-to-speech | —Unverified | 0 | 0 |
| A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples | Jul 4, 2019 | BinarizationGeneral Classification | —Unverified | 0 | 0 |
| DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis | Oct 17, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech | Feb 27, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing | Jun 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset | Dec 25, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Dual Supervised Learning | Jul 3, 2017 | General Classificationimage-classification | —Unverified | 0 | 0 |
| DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance | Aug 26, 2024 | Diversitytext-to-speech | —Unverified | 0 | 0 |
| BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model | Jul 4, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Dual Script E2E framework for Multilingual and Code-Switching ASR | Jun 2, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Dual Audio-Centric Modality Coupling for Talking Head Generation | Mar 26, 2025 | NeRFTalking Head Generation | —Unverified | 0 | 0 |
| Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy | Oct 13, 2022 | Generative Adversarial NetworkSpeaker anonymization | —Unverified | 0 | 0 |
| DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction | Mar 1, 2023 | Dynamic Time WarpingMetric Learning | —Unverified | 0 | 0 |
| DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech | Jun 25, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS2 | May 22, 2025 | BenchmarkingDialogue Generation | —Unverified | 0 | 0 |
| DPP-TTS: Diversifying prosodic features of speech via determinantal point processes | Oct 23, 2023 | DiversityPoint Processes | —Unverified | 0 | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 | 0 |
| An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis | Jun 3, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 | 0 |
| Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis | Jan 22, 2024 | Speaker VerificationSpeech Synthesis | —Unverified | 0 | 0 |
| DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech | Sep 18, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Do Prosody Transfer Models Transfer Prosody? | Mar 7, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Does Audio Deepfake Detection Generalize? | Mar 30, 2022 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 | 0 |