| A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance | May 1, 2016 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Text Normalisation System for Non-Standard English Words | Sep 1, 2017 | Automatic Speech Recognition (ASR)Speech Recognition | —Unverified | 0 |
| A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis | Mar 22, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Text to Speech (TTS) System with English to Punjabi Conversion | Nov 13, 2014 | text-to-speechText to Speech | —Unverified | 0 |
| A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture | Jul 22, 2020 | RhythmSpeech Synthesis | —Unverified | 0 |
| Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation | Mar 7, 2024 | DiversityMachine Translation | —Unverified | 0 |
| Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech | Apr 30, 2024 | Decodertext-to-speech | —Unverified | 0 |
| AttentionStitch: How Attention Solves the Speech Editing Problem | Mar 5, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms | Nov 9, 2018 | GPUImage Captioning | —Unverified | 0 |
| Audiobook Dialogues as Training Data for Conversational Style Synthetic Voices | Jun 1, 2022 | Sentencetext-to-speech | —Unverified | 0 |
| Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data | Jun 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Audio Deep Fake Detection System with Neural Stitching for ADD 2022 | Apr 19, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI | Mar 23, 2023 | Speech EnhancementSpeech Synthesis | —Unverified | 0 |
| AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models | May 20, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| AudioVisual Speech Synthesis: A brief literature review | Feb 18, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Augmentation through Laundering Attacks for Audio Spoof Detection | Oct 1, 2024 | Data AugmentationFace Swapping | —Unverified | 0 |
| Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework | Nov 4, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Augmenting text for spoken language understanding with Large Language Models | Sep 17, 2023 | Semantic ParsingSpoken Language Understanding | —Unverified | 0 |
| A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages | Oct 18, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A unified front-end framework for English text-to-speech synthesis | May 18, 2023 | Speech SynthesisText Normalization | —Unverified | 0 |
| A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction | Dec 11, 2024 | DecoderSelf-Supervised Learning | —Unverified | 0 |
| A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis | Nov 11, 2019 | Polyphone disambiguationSpeech Synthesis | —Unverified | 0 |
| A Unified Transformer-based Framework for Duplex Text Normalization | Aug 23, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Automatic Arabic Dialect Identification Systems for Written Texts: A Survey | Sep 26, 2020 | Dialect IdentificationMachine Translation | —Unverified | 0 |
| Automatic Evaluation of Speaker Similarity | Jul 1, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis | May 29, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners | Feb 28, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Automatic Speech Recognition for Hindi | Jun 26, 2024 | Action DetectionActivity Detection | —Unverified | 0 |
| AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech | Nov 28, 2016 | text-to-speechText to Speech | —Unverified | 0 |
| Autoregressive Diffusion Transformer for Text-to-Speech Synthesis | Jun 8, 2024 | Audio GenerationDecoder | —Unverified | 0 |
| Autoregressive Speech Synthesis with Next-Distribution Prediction | Dec 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Autoregressive Speech Synthesis without Vector Quantization | Jul 11, 2024 | Audio CompressionDiversity | —Unverified | 0 |
| Auto Spell Suggestion for High Quality Speech Synthesis in Hindi | Feb 15, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis | Apr 14, 2025 | RAGRetrieval-augmented Generation | —Unverified | 0 |
| A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers | Apr 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation | Nov 17, 2022 | Data AugmentationMachine Translation | —Unverified | 0 |
| Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS | Oct 9, 2024 | DiversitySpeech Synthesis | —Unverified | 0 |
| Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM | Feb 24, 2025 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 |
| BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data | Feb 12, 2024 | DecoderDisentanglement | —Unverified | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS2 | May 22, 2025 | BenchmarkingDialogue Generation | —Unverified | 0 |
| BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model | Jul 4, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling | Sep 24, 2024 | Articlestext-to-speech | —Unverified | 0 |
| BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing | Jun 4, 2025 | Quantizationtext-to-speech | —Unverified | 0 |
| BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation | Jun 4, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization | Feb 4, 2020 | Bayesian Optimizationtext-to-speech | —Unverified | 0 |
| Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study | Jun 7, 2024 | DiversityLanguage Modeling | —Unverified | 0 |
| Boosting Large Language Model for Speech Synthesis: An Empirical Study | Dec 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio | Nov 25, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech | Sep 14, 2019 | text-to-speechText to Speech | —Unverified | 0 |