| A Comparative Analysis of Pretrained Language Models for Text-to-Speech | Sep 4, 2023 | Natural Language UnderstandingPrediction | —Unverified | 0 |
| A Context-Based Numerical Format Prediction for a Text-To-Speech System | Nov 19, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| A Corpus of Neutral Voice Speech in Brazilian Portuguese | May 21, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Cost Efficient Approach to Correct OCR Errors in Large Document Collections | May 28, 2019 | ClusteringLanguage Modelling | —Unverified | 0 |
| Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning | Sep 15, 2024 | Multi-Task Learningtext-to-speech | —Unverified | 0 |
| A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System | Jul 13, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN | May 12, 2020 | Few-Shot Learningtext-to-speech | —Unverified | 0 |
| Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data | Oct 25, 2022 | DecoderDisentanglement | —Unverified | 0 |
| Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers | Nov 1, 2022 | parameter-efficient fine-tuningSpeech Synthesis | —Unverified | 0 |
| Adapting TTS models For New Speakers using Transfer Learning | Oct 12, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Adaptive re-calibration of channel-wise features for Adversarial Audio Classification | Oct 21, 2022 | Audio ClassificationFace Swapping | —Unverified | 0 |
| AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style | Jul 6, 2021 | DecoderMixture-of-Experts | —Unverified | 0 |
| AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios | Apr 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | Jun 6, 2023 | Neural Renderingtext-to-speech | —Unverified | 0 |
| A Deep Generative Acoustic Model for Compositional Automatic Speech Recognition | Oct 23, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ADEPT: A Dataset for Evaluating Prosody Transfer | Jun 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data | Jan 21, 2025 | Domain Adaptationspeech-recognition | —Unverified | 0 |
| Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters | Jun 19, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis | Jan 22, 2024 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset | Dec 25, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System | Oct 5, 2024 | Adversarial PurificationSpeech Synthesis | —Unverified | 0 |
| Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech | Oct 12, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Adversarial speech for voice privacy protection from Personalized Speech generation | Jan 22, 2024 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting | Aug 20, 2024 | Keyword Spottingtext-to-speech | —Unverified | 0 |
| 台語古詩朗誦系統A Taiwanese Text-to-Speech System for Ancient Poems[In Chinese] | Oct 1, 2018 | text-to-speechText to Speech | —Unverified | 0 |
| AE-Flow: AutoEncoder Normalizing Flow | Dec 27, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis | Aug 16, 2023 | AttributeSpeech Synthesis | —Unverified | 0 |
| A Framework for Synthetic Audio Conversations Generation using Large Language Models | Sep 2, 2024 | Audio ClassificationAudio Tagging | —Unverified | 0 |
| A Fully Time-domain Neural Model for Subband-based Speech Synthesizer | Oct 22, 2018 | text-to-speechText to Speech | —Unverified | 0 |
| A Generative Model of a Pronunciation Lexicon for Hindi | May 6, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer | Jun 6, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru | Feb 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| AI-Powered Assistive Technologies for Visual Impairment | Jan 14, 2025 | Object Recognitiontext-to-speech | —Unverified | 0 |
| A Language Modeling Approach to Diacritic-Free Hebrew TTS | Jul 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Large-Scale User Study of an Alexa Prize Chatbot: Effect of TTS Dynamism on Perceived Quality of Social Dialog | Sep 1, 2019 | Chatbottext-to-speech | —Unverified | 0 |
| A learned conditional prior for the VAE acoustic space of a TTS system | Jun 14, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration | Dec 11, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Almost Unsupervised Text to Speech and Automatic Speech Recognition | May 13, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input | Feb 19, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Mask-based Model for Mandarin Chinese Polyphone Disambiguation | Oct 21, 2020 | Polyphone disambiguationtext-to-speech | —Unverified | 0 |
| A Melody-Unsupervision Model for Singing Voice Synthesis | Oct 13, 2021 | modelSinging Voice Synthesis | —Unverified | 0 |
| A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach | Jul 5, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models | Apr 22, 2025 | cross-modal alignmentScript Generation | —Unverified | 0 |
| A multilingual training strategy for low resource Text to Speech | Sep 2, 2024 | Cross-Lingual Transfertext-to-speech | —Unverified | 0 |
| A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge | Jun 22, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation | Dec 13, 2024 | Data AugmentationSarcasm Detection | —Unverified | 0 |
| An adaptable task-oriented dialog system for stand-alone embedded devices | Jul 1, 2019 | Dialogue ManagementManagement | —Unverified | 0 |
| An Algorithm Based on Empirical Methods, for Text-to-Tuneful-Speech Synthesis of Sanskrit Verse | Sep 15, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue | Dec 7, 2022 | Spoken Dialogue Systemstext-to-speech | —Unverified | 0 |
| An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments | Jul 14, 2025 | Speech-to-Texttext-to-speech | —Unverified | 0 |