| MultiSpeech: Multi-Speaker Text to Speech with Transformer | Jun 8, 2020 | Decodertext-to-speech | CodeCode Available | 1 |
| End-to-End Adversarial Text-to-Speech | Jun 5, 2020 | Adversarial TextDynamic Time Warping | CodeCode Available | 1 |
| Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search | May 22, 2020 | text-to-speechText to Speech | CodeCode Available | 1 |
| Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis | May 12, 2020 | Speech SynthesisStyle Transfer | CodeCode Available | 1 |
| From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint | May 10, 2020 | Speaker VerificationSpeech Synthesis | CodeCode Available | 1 |
| Transformer based Grapheme-to-Phoneme Conversion | Apr 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset | Apr 7, 2020 | Grapheme-to-Phoneme ConversionPolyphone disambiguation | CodeCode Available | 1 |
| Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0 | Mar 14, 2020 | ClusteringRepresentation Learning | CodeCode Available | 1 |
| Semi-Supervised Neural Architecture Search | Feb 24, 2020 | GPUNatural Language Transduction | CodeCode Available | 1 |
| Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining | Dec 14, 2019 | text-to-speechText to Speech | CodeCode Available | 1 |
| Attention model for articulatory features detection | Jul 2, 2019 | Manner Of Articulation Detectionmodel | CodeCode Available | 1 |
| In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data | Apr 4, 2019 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis | Mar 27, 2019 | Emotional Speech SynthesisExpressive Speech Synthesis | CodeCode Available | 1 |
| End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model | Feb 18, 2019 | Retrievaltext-to-speech | CodeCode Available | 1 |
| Robust universal neural vocoding | Nov 15, 2018 | text-to-speechText to Speech | CodeCode Available | 1 |
| ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech | Jul 19, 2018 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text | Apr 3, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention | Oct 24, 2017 | text-to-speechText to Speech | CodeCode Available | 1 |
| VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop | Jul 20, 2017 | Sentencetext-to-speech | CodeCode Available | 1 |
| Tacotron: Towards End-to-End Speech Synthesis | Mar 29, 2017 | Audio SynthesisSpeech Synthesis | CodeCode Available | 1 |
| WaveNet: A Generative Model for Raw Audio | Sep 12, 2016 | Audio Generationmodel | CodeCode Available | 1 |
| Hear Your Code Fail, Voice-Assisted Debugging for Python | Jul 20, 2025 | CPUMedical Diagnosis | —Unverified | 0 |
| NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech | Jul 17, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge | Jul 15, 2025 | Speech Enhancementtext-to-speech | —Unverified | 0 |
| An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments | Jul 14, 2025 | Speech-to-Texttext-to-speech | —Unverified | 0 |
| Exploiting Leaderboards for Large-Scale Distribution of Malicious Models | Jul 11, 2025 | Model DiscoveryText Generation | —Unverified | 0 |
| MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling | Jul 11, 2025 | Audio SynthesisLanguage Modelling | —Unverified | 0 |
| Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis | Jul 8, 2025 | Data AugmentationMixture-of-Experts | —Unverified | 0 |
| An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS | Jun 25, 2025 | Speaker Recognitiontext-to-speech | —Unverified | 0 |
| TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems | Jun 24, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization | Jun 20, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Optimizing Multilingual Text-To-Speech with Accents & Emotions | Jun 19, 2025 | DisentanglementEmotion Recognition | —Unverified | 0 |
| Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement | Jun 19, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction | Jun 18, 2025 | Sentencetext-to-speech | —Unverified | 0 |
| EmoNews: A Spoken Dialogue System for Expressive News Conversations | Jun 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling | Jun 14, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech | Jun 14, 2025 | Grapheme-to-Phoneme Conversiontext-to-speech | —Unverified | 0 |
| Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs | Jun 12, 2025 | Speech-to-Speech Translationtext-to-speech | —Unverified | 0 |
| S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation | Jun 11, 2025 | Reading ComprehensionSpeech Synthesis | —Unverified | 0 |
| UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching | Jun 11, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data | Jun 10, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Seeing Voices: Generating A-Roll Video from Audio with Mirage | Jun 9, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation | Jun 9, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Voice Impression Control in Zero-Shot TTS | Jun 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Intelligibility of Text-to-Speech Systems for Mathematical Expressions | Jun 5, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning | Jun 5, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Can we reconstruct a dysarthric voice with the large speech model Parler TTS? | Jun 4, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions | Jun 4, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing | Jun 4, 2025 | Quantizationtext-to-speech | —Unverified | 0 |
| UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation | Jun 4, 2025 | cross-modal alignmentLipreading | —Unverified | 0 |