| Une aide \`a la communication par pictogrammes avec pr\'ediction s\'emantique | Jun 1, 2015 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation | Jun 4, 2025 | cross-modal alignmentLipreading | —Unverified | 0 | 0 |
| Unified speech and gesture synthesis using flow matching | Oct 8, 2023 | Audio SynthesisMotion Synthesis | —Unverified | 0 | 0 |
| UniFLG: Unified Facial Landmark Generator from Text or Speech | Feb 28, 2023 | DecoderFace Generation | —Unverified | 0 | 0 |
| Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS) | Jul 4, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion | Jan 10, 2023 | Quantizationtext-to-speech | —Unverified | 0 | 0 |
| UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation | Mar 2, 2025 | DecoderRepresentation Learning | —Unverified | 0 | 0 |
| Unsupervised Data Validation Methods for Efficient Model Training | Oct 10, 2024 | Data Augmentationmodel | —Unverified | 0 | 0 |
| Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages | Aug 11, 2020 | Quantizationtext-to-speech | —Unverified | 0 | 0 |
| Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis | Oct 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Unsupervised Polyglot Text To Speech | Feb 6, 2019 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Unsupervised pre-training for sequence to sequence speech recognition | Oct 28, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis | Apr 7, 2022 | QuantizationSpeech Synthesis | —Unverified | 0 | 0 |
| Unsupervised word-level prosody tagging for controllable speech synthesis | Feb 15, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Controllable Speaking Styles Using a Large Language Model | May 17, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Using Audio Books for Training a Text-to-Speech System | May 1, 2014 | DiversitySpeech Synthesis | —Unverified | 0 | 0 |
| Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition | Jan 6, 2023 | Domain AdaptationGPU | —Unverified | 0 | 0 |
| Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement | Nov 12, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Using previous acoustic context to improve Text-to-Speech synthesis | Dec 7, 2020 | DecoderSpeech Synthesis | —Unverified | 0 | 0 |
| Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset | Sep 14, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems | Nov 23, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Using the LARA Little Prince to compare human and TTS audio quality | Jun 1, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech | Nov 28, 2019 | DisentanglementExpressive Speech Synthesis | —Unverified | 0 | 0 |
| Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction | Jan 3, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Utilizing Speech Emotion Recognition and Recommender Systems for Negative Emotion Handling in Therapy Chatbots | Nov 18, 2023 | ChatbotEmotion Recognition | —Unverified | 0 | 0 |
| Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE | Jun 6, 2022 | Representation LearningSpeech Representation Learning | —Unverified | 0 | 0 |
| UzbekTagger: The rule-based POS tagger for Uzbek language | Jan 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages | May 21, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers | Jun 8, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment | Jun 12, 2024 | QuantizationSpeech Synthesis | —Unverified | 0 | 0 |
| VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech | Jan 25, 2024 | DecoderHallucination | —Unverified | 0 | 0 |
| VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention | Feb 12, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| 可變速中文文字轉語音系統 (Variable Speech Rate Mandarin Chinese Text-to-Speech System) [In Chinese] | Mar 1, 2012 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow | Feb 27, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech | Jun 12, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Vers une annotation automatique de corpus audio pour la synth\`ese de parole (Towards Fully Automatic Annotation of Audio Books for Text-To-Speech (TTS) Synthesis) [in French] | Jun 1, 2012 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement | Feb 11, 2025 | Disentanglementtext-to-speech | —Unverified | 0 | 0 |
| ViDA-MAN: Visual Dialog with Digital Humans | Oct 26, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| Vietnamese Text-To-Speech Shared Task VLSP 2020: Remaining problems with state-of-the-art techniques | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation | May 25, 2023 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech | Oct 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Nov 26, 2024 | Decodermultimodal generation | —Unverified | 0 | 0 |
| Visual-Aware Text-to-Speech | Jun 21, 2023 | RhythmSpeech Synthesis | —Unverified | 0 | 0 |
| VisualSpeech: Enhance Prosody with Visual Context in TTS | Jan 31, 2025 | Prosody Predictiontext-to-speech | —Unverified | 0 | 0 |
| VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over | Oct 7, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer | May 22, 2023 | DecoderDenoising | —Unverified | 0 | 0 |
| Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise | Mar 20, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| VocalEyes: Enhancing Environmental Perception for the Visually Impaired through Vision-Language Models and Distance-Aware Object Detection | Mar 10, 2025 | NVIDIA Jetson Orin Nanoobject-detection | —Unverified | 0 | 0 |
| Voice-Assisted Real-Time Traffic Sign Recognition System Using Convolutional Neural Network | Apr 11, 2024 | Autonomous Vehiclestext-to-speech | —Unverified | 0 | 0 |
| Voice Builder: A Tool for Building Text-To-Speech Voices | May 1, 2018 | text-to-speechText to Speech | —Unverified | 0 | 0 |