| Rep2wav: Noise Robust text-to-speech Using self-supervised representations | Aug 28, 2023 | Speech Enhancementtext-to-speech | —Unverified | 0 | 0 |
| Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction | Oct 20, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification | Apr 6, 2022 | AttributeSpeaker Verification | —Unverified | 0 | 0 |
| 中文轉客文文轉音系統中的客語斷詞處理之研究 (Research on Hakka Word Segmentation Processes in Chinese-to-Hakka Text-to-Speech System )[In Chinese] | Oct 1, 2014 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation | Oct 28, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages | May 30, 2023 | Predictiontext-to-speech | —Unverified | 0 | 0 |
| Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation | Nov 19, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Retrieval-Augmented Audio Deepfake Detection | Apr 22, 2024 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 | 0 |
| ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement | Dec 21, 2022 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 | 0 |
| ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration | Jan 1, 2023 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 | 0 |
| Revisiting IPA-based Cross-lingual Text-to-speech | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Revisiting Over-Smoothness in Text to Speech | Feb 26, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis | May 25, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation | Feb 21, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis | Jun 5, 2023 | RhythmSentence | —Unverified | 0 | 0 |
| R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS | Jun 30, 2022 | DecoderGPU | —Unverified | 0 | 0 |
| Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization | Jul 2, 2024 | Inference OptimizationSpeech Synthesis | —Unverified | 0 | 0 |
| RSS-TOBI - A Prosodically Enhanced Romanian Speech Corpus | May 1, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| RUSLAN: Russian Spoken Language Corpus for Speech Synthesis | Jun 26, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform | Aug 12, 2021 | Speaker VerificationSynthetic Speech Detection | —Unverified | 0 | 0 |
| S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation | Jun 11, 2025 | Reading ComprehensionSpeech Synthesis | —Unverified | 0 | 0 |
| Sadeed: Advancing Arabic Diacritization Through Small Language Model | Apr 30, 2025 | Arabic Text DiacritizationBenchmarking | —Unverified | 0 | 0 |
| SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction | Jun 2, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation | Nov 27, 2024 | Question AnsweringSpeech Enhancement | —Unverified | 0 | 0 |
| SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis | Aug 2, 2023 | DecoderSelf-Supervised Learning | —Unverified | 0 | 0 |
| Sample Efficient Adaptive Text-to-Speech | Sep 27, 2018 | Meta-Learningtext-to-speech | —Unverified | 0 | 0 |
| SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech | Jun 24, 2022 | Rhythmtext-to-speech | —Unverified | 0 | 0 |
| SANIP: Shopping Assistant and Navigation for the visually impaired | Sep 8, 2022 | Objectobject-detection | —Unverified | 0 | 0 |
| SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate | Jul 13, 2022 | Speech Separationtext-to-speech | —Unverified | 0 | 0 |
| Scalable Multilingual Frontend for TTS | Apr 10, 2020 | ChunkingMachine Translation | —Unverified | 0 | 0 |
| Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling | Dec 19, 2024 | AttributeSpeech Enhancement | —Unverified | 0 | 0 |
| Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis | Dec 6, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement | May 20, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Seeing Voices: Generating A-Roll Video from Audio with Mirage | Jun 9, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech | Oct 7, 2024 | Computational Efficiencytext-to-speech | —Unverified | 0 | 0 |
| Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information | May 21, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection | Aug 30, 2024 | Self-Supervised LearningSpeech Synthesis | —Unverified | 0 | 0 |
| Self-Attention Linguistic-Acoustic Decoder | Aug 31, 2018 | CPUDecoder | —Unverified | 0 | 0 |
| Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text | Apr 30, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Semi-Supervised Generative Modeling for Controllable Speech Synthesis | Oct 3, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Semi-Supervised Learning Based on Reference Model for Low-resource TTS | Oct 25, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation | May 16, 2020 | DecoderSpeech Synthesis | —Unverified | 0 | 0 |
| Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis | Aug 30, 2018 | DecoderSpeech Synthesis | —Unverified | 0 | 0 |
| Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages | Nov 19, 2021 | Data Augmentationspeech-recognition | —Unverified | 0 | 0 |
| Sentence Based Discourse Classification for Hindi Story Text-to-Speech (TTS) System | Dec 1, 2016 | General ClassificationSentence | —Unverified | 0 | 0 |
| Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis | May 18, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody | Jun 29, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models | Aug 25, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS | Nov 10, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation | Oct 14, 2021 | Generative Adversarial NetworkGPU | —Unverified | 0 | 0 |