| Word-wise intonation model for cross-language TTS systems | Sep 30, 2024 | Dynamic Time WarpingProsody Prediction | —Unverified | 0 |
| You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation | May 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication | Mar 21, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Zero-shot Cross-lingual Voice Transfer for TTS | Sep 20, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention | Jan 25, 2022 | FormSpeech Synthesis | —Unverified | 0 |
| Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling | May 26, 2025 | SentenceSpeech Synthesis | —Unverified | 0 |
| Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment | Sep 11, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora | Sep 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Zero-Shot Text-to-Speech for Vietnamese | Jun 2, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model | Apr 24, 2023 | RhythmSelf-Supervised Learning | —Unverified | 0 |
| Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model | Jul 24, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models | May 23, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities | May 29, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech | Aug 28, 2023 | Domain Generalizationtext-to-speech | —Unverified | 0 |
| Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis | Apr 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Punjabi Text-To-Speech Synthesis System | Dec 1, 2012 | Speech Synthesistext-to-speech | —Unverified | 0 |
| 運用Python結合語音辨識及合成技術於自動化音文同步之實作(A Python Implementation of Automatic Speech-text Synchronization Using Speech Recognition and Text-to-Speech Technology)[In Chinese] | Oct 1, 2015 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis | Mar 14, 2023 | Emotional Speech SynthesisSentence | —Unverified | 0 |
| RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis | Apr 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning | Dec 2, 2023 | Decodertext-to-speech | —Unverified | 0 |
| RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations | May 24, 2025 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis | Oct 29, 2024 | DenoisingSinging Voice Synthesis | —Unverified | 0 |
| Reading Assistance through LARA, the Learning And Reading Assistant | Jun 1, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Real-Time Pill Identification for the Visually Impaired Using Deep Learning | May 8, 2024 | Deep LearningManagement | —Unverified | 0 |
| ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence | May 9, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis | Sep 8, 2021 | Expressive Speech SynthesisSentence | —Unverified | 0 |
| Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images | Sep 1, 2017 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss | Apr 28, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech | Jun 5, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability | Apr 3, 2021 | Emotion Recognitionreinforcement-learning | —Unverified | 0 |
| DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models | May 23, 2024 | Image Generationreinforcement-learning | —Unverified | 0 |
| Rep2wav: Noise Robust text-to-speech Using self-supervised representations | Aug 28, 2023 | Speech Enhancementtext-to-speech | —Unverified | 0 |
| Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction | Oct 20, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification | Apr 6, 2022 | AttributeSpeaker Verification | —Unverified | 0 |
| 中文轉客文文轉音系統中的客語斷詞處理之研究 (Research on Hakka Word Segmentation Processes in Chinese-to-Hakka Text-to-Speech System )[In Chinese] | Oct 1, 2014 | text-to-speechText to Speech | —Unverified | 0 |
| Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation | Oct 28, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages | May 30, 2023 | Predictiontext-to-speech | —Unverified | 0 |
| Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation | Nov 19, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Retrieval-Augmented Audio Deepfake Detection | Apr 22, 2024 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 |
| ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement | Dec 21, 2022 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration | Jan 1, 2023 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| Revisiting IPA-based Cross-lingual Text-to-speech | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Revisiting Over-Smoothness in Text to Speech | Feb 26, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis | May 25, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation | Feb 21, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis | Jun 5, 2023 | RhythmSentence | —Unverified | 0 |
| R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS | Jun 30, 2022 | DecoderGPU | —Unverified | 0 |
| Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization | Jul 2, 2024 | Inference OptimizationSpeech Synthesis | —Unverified | 0 |
| RSS-TOBI - A Prosodically Enhanced Romanian Speech Corpus | May 1, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| RUSLAN: Russian Spoken Language Corpus for Speech Synthesis | Jun 26, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |