| A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions | Jun 4, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation | Jun 10, 2022 | Machine Translationtext-to-speech | —Unverified | 0 |
| Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System | Oct 5, 2024 | Adversarial PurificationSpeech Synthesis | —Unverified | 0 |
| Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs | Jun 12, 2025 | Speech-to-Speech Translationtext-to-speech | —Unverified | 0 |
| Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling | Sep 24, 2024 | Articlestext-to-speech | —Unverified | 0 |
| A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples | Jul 4, 2019 | BinarizationGeneral Classification | —Unverified | 0 |
| Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset | Dec 25, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition | Jul 31, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model | Jul 4, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis | Jan 22, 2024 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS2 | May 22, 2025 | BenchmarkingDialogue Generation | —Unverified | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis | Jun 3, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| A Challenge Set and Methods for Noun-Verb Ambiguity | Oct 1, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS | Oct 24, 2022 | Data AugmentationGPU | —Unverified | 0 |
| Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters | Jun 19, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| DNN-based Speech Synthesis for Indian Languages from ASCII text | Aug 18, 2016 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Efficient data selection employing Semantic Similarity-based Graph Structures for model training | Feb 22, 2024 | Semantic SimilaritySemantic Textual Similarity | —Unverified | 0 |
| BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data | Feb 12, 2024 | DecoderDisentanglement | —Unverified | 0 |
| An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS | Jun 9, 2024 | DenoisingSpeech Denoising | —Unverified | 0 |
| A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data | Jan 21, 2025 | Domain Adaptationspeech-recognition | —Unverified | 0 |
| Efficient Generative Modeling with Residual Vector Quantization-Based Tokens | Dec 13, 2024 | Conditional Image GenerationImage Generation | —Unverified | 0 |
| Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM | Feb 24, 2025 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 |
| Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS | Oct 9, 2024 | DiversitySpeech Synthesis | —Unverified | 0 |
| An In-depth Analysis of the Effect of Text Normalization in Social Media | May 1, 2015 | Dependency Parsingnamed-entity-recognition | —Unverified | 0 |
| Discovering the Italian literature: interactive access to audio indexed text resources | May 1, 2014 | Cultural Vocal Bursts Intensity PredictionSentence | —Unverified | 0 |
| Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation | Nov 17, 2022 | Data AugmentationMachine Translation | —Unverified | 0 |
| Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT | Jan 2, 2025 | Polyphone disambiguationSentence | —Unverified | 0 |
| Direct Text to Speech Translation System using Acoustic Units | Sep 14, 2023 | DecoderSpeech-to-Speech Translation | —Unverified | 0 |
| An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer | Jan 4, 2018 | Neural Network simulationtext-to-speech | —Unverified | 0 |
| Voice Impression Control in Zero-Shot TTS | Jun 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Efficient Incremental Text-to-Speech on GPUs | Nov 25, 2022 | GPUSpeech Synthesis | —Unverified | 0 |
| A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers | Apr 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Direct Speech to Speech Translation: A Review | Mar 3, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS | Jun 25, 2025 | Speaker Recognitiontext-to-speech | —Unverified | 0 |
| DiscreTalk: Text-to-Speech as a Machine Translation Problem | May 12, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech | Oct 24, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing | Jun 4, 2024 | DecoderLanguage Modeling | —Unverified | 0 |
| Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization | Oct 30, 2018 | Data AugmentationDisentanglement | —Unverified | 0 |
| DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction | May 26, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage | Jun 13, 2024 | Sentencetext-to-speech | —Unverified | 0 |
| Distribution augmentation for low-resource expressive text-to-speech | Feb 13, 2022 | Data AugmentationDiversity | —Unverified | 0 |
| Digital Einstein Experience: Fast Text-to-Speech for Conversational AI | Jul 21, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis | Oct 14, 2024 | DenoisingSpeaker Verification | —Unverified | 0 |
| DiffVoice: Text-to-Speech with Latent Diffusion | Apr 23, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Does Audio Deepfake Detection Generalize? | Mar 30, 2022 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 |
| Do Prosody Transfer Models Transfer Prosody? | Mar 7, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech | Sep 18, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| DPP-TTS: Diversifying prosodic features of speech via determinantal point processes | Oct 23, 2023 | DiversityPoint Processes | —Unverified | 0 |
| AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis | Apr 14, 2025 | RAGRetrieval-augmented Generation | —Unverified | 0 |