| RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform | Aug 12, 2021 | Speaker VerificationSynthetic Speech Detection | —Unverified | 0 |
| S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation | Jun 11, 2025 | Reading ComprehensionSpeech Synthesis | —Unverified | 0 |
| Sadeed: Advancing Arabic Diacritization Through Small Language Model | Apr 30, 2025 | Arabic Text DiacritizationBenchmarking | —Unverified | 0 |
| SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction | Jun 2, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation | Nov 27, 2024 | Question AnsweringSpeech Enhancement | —Unverified | 0 |
| SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis | Aug 2, 2023 | DecoderSelf-Supervised Learning | —Unverified | 0 |
| Sample Efficient Adaptive Text-to-Speech | Sep 27, 2018 | Meta-Learningtext-to-speech | —Unverified | 0 |
| SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech | Jun 24, 2022 | Rhythmtext-to-speech | —Unverified | 0 |
| SANIP: Shopping Assistant and Navigation for the visually impaired | Sep 8, 2022 | Objectobject-detection | —Unverified | 0 |
| SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate | Jul 13, 2022 | Speech Separationtext-to-speech | —Unverified | 0 |
| Scalable Multilingual Frontend for TTS | Apr 10, 2020 | ChunkingMachine Translation | —Unverified | 0 |
| Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling | Dec 19, 2024 | AttributeSpeech Enhancement | —Unverified | 0 |
| Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis | Dec 6, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement | May 20, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Seeing Voices: Generating A-Roll Video from Audio with Mirage | Jun 9, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech | Oct 7, 2024 | Computational Efficiencytext-to-speech | —Unverified | 0 |
| Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information | May 21, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection | Aug 30, 2024 | Self-Supervised LearningSpeech Synthesis | —Unverified | 0 |
| Self-Attention Linguistic-Acoustic Decoder | Aug 31, 2018 | CPUDecoder | —Unverified | 0 |
| Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text | Apr 30, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Semi-Supervised Generative Modeling for Controllable Speech Synthesis | Oct 3, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Semi-Supervised Learning Based on Reference Model for Low-resource TTS | Oct 25, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation | May 16, 2020 | DecoderSpeech Synthesis | —Unverified | 0 |
| Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis | Aug 30, 2018 | DecoderSpeech Synthesis | —Unverified | 0 |
| Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages | Nov 19, 2021 | Data Augmentationspeech-recognition | —Unverified | 0 |
| Sentence Based Discourse Classification for Hindi Story Text-to-Speech (TTS) System | Dec 1, 2016 | General ClassificationSentence | —Unverified | 0 |
| Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis | May 18, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody | Jun 29, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models | Aug 25, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS | Nov 10, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation | Oct 14, 2021 | Generative Adversarial NetworkGPU | —Unverified | 0 |
| Singing Synthesis: with a little help from my attention | Dec 12, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow | Apr 10, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs | Jul 18, 2023 | Generative Adversarial NetworkLanguage Modeling | —Unverified | 0 |
| Smart Summarizer for Blind People | Jan 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech | Nov 30, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| SNIPER Training: Single-Shot Sparse Training for Text-to-Speech | Nov 14, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| SoK: A Study of the Security on Voice Processing Systems | Dec 24, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis | Apr 6, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Source Tracing of Audio Deepfake Systems | Jul 10, 2024 | Face Swappingtext-to-speech | —Unverified | 0 |
| MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis | Feb 26, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation | Apr 7, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Speaker-adaptive neural vocoders for parametric speech synthesis systems | Nov 8, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Speaker Generation | Nov 7, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Speaker-independent raw waveform model for glottal excitation | Apr 25, 2018 | modelSpeech Synthesis | —Unverified | 0 |
| Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis | Jun 3, 2021 | Data AugmentationSpeaker Verification | —Unverified | 0 |
| Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention | Oct 29, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 |
| SpeakStream: Streaming Text-to-Speech with Interleaved Data | May 25, 2025 | Decodertext-to-speech | —Unverified | 0 |
| Speak While You Think: Streaming Speech Synthesis During Text Generation | Sep 20, 2023 | Speech SynthesisText Generation | —Unverified | 0 |
| Spectral Codecs: Improving Non-Autoregressive Speech Synthesis with Spectrogram-Based Audio Codecs | Jun 7, 2024 | QuantizationSpeech Synthesis | —Unverified | 0 |