| Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning | Feb 10, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer | Sep 3, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module | Feb 16, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Voice Imitating Text-to-Speech Neural Networks | Jun 4, 2018 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| VoiceLDM: Text-to-Speech with Environmental Context | Sep 24, 2023 | AudioCapstext-to-speech | —Unverified | 0 | 0 |
| VoiceWukong: Benchmarking Deepfake Voice Detection | Sep 10, 2024 | BenchmarkingFace Swapping | —Unverified | 0 | 0 |
| Voicing Personas: Rewriting Persona Descriptions into Style Prompts for Controllable Text-to-Speech | May 21, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka | Sep 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing | Aug 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature | Apr 2, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications | May 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes | Nov 29, 2023 | Face RecognitionFace Swapping | —Unverified | 0 | 0 |
| Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder | Jul 31, 2018 | Generative Adversarial NetworkSpeech Synthesis | —Unverified | 0 | 0 |
| Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks | Oct 30, 2018 | Image GenerationSpeech Synthesis | —Unverified | 0 | 0 |
| WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss | Feb 2, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis | Mar 24, 2023 | Generative Adversarial NetworkSpeech Synthesis | —Unverified | 0 | 0 |
| WavThruVec: Latent speech representation as intermediate features for neural speech synthesis | Mar 31, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing | Jun 2, 2025 | Keyword Spottingspeech-recognition | —Unverified | 0 | 0 |
| Weakly-supervised text-to-speech alignment confidence measure | Dec 1, 2016 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement | May 30, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| What happens to diffusion model likelihood when your model is conditional? | Sep 10, 2024 | domain classificationmodel | —Unverified | 0 | 0 |
| What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS | Sep 4, 2020 | DecoderSentence | —Unverified | 0 | 0 |
| What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection | May 23, 2025 | Face SwappingSensitivity | —Unverified | 0 | 0 |
| Whispered and Lombard Neural Speech Synthesis | Jan 13, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 | 0 |
| Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective | Dec 22, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Word-wise intonation model for cross-language TTS systems | Sep 30, 2024 | Dynamic Time WarpingProsody Prediction | —Unverified | 0 | 0 |
| You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation | May 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication | Mar 21, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Zero-shot Cross-lingual Voice Transfer for TTS | Sep 20, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention | Jan 25, 2022 | FormSpeech Synthesis | —Unverified | 0 | 0 |
| Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling | May 26, 2025 | SentenceSpeech Synthesis | —Unverified | 0 | 0 |
| Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment | Sep 11, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora | Sep 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Zero-Shot Text-to-Speech for Vietnamese | Jun 2, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model | Apr 24, 2023 | RhythmSelf-Supervised Learning | —Unverified | 0 | 0 |
| Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model | Jul 24, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models | May 23, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities | May 29, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| The Zero Resource Speech Challenge 2019: TTS without T | Apr 25, 2019 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories | Aug 20, 2019 | RetrievalTAG | —Unverified | 0 | 0 |
| On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition | Jul 31, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Handling Numeric Expressions in Automatic Speech Recognition | Jul 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation | Aug 1, 2024 | Representation LearningSpeech Synthesis | —Unverified | 0 | 0 |
| Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach | Sep 10, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech | May 15, 2025 | Emotional Speech SynthesisLanguage Modeling | —Unverified | 0 | 0 |
| Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese | May 16, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Voice Impression Control in Zero-Shot TTS | Jun 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs | Jun 12, 2025 | Speech-to-Speech Translationtext-to-speech | —Unverified | 0 | 0 |
| AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge | Aug 30, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 | 0 |
| A Bengali HMM Based Speech Synthesis System | Jun 16, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |