| A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI | Mar 23, 2023 | Speech EnhancementSpeech Synthesis | —Unverified | 0 |
| Code-Switching Text Generation and Injection in Mandarin-English ASR | Mar 20, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Cross-speaker Emotion Transfer by Manipulating Speech Style Latents | Mar 15, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Controllable Prosody Generation With Partial Inputs | Mar 14, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis | Mar 14, 2023 | Emotional Speech SynthesisSentence | —Unverified | 0 |
| An End-to-End Neural Network for Image-to-Audio Transformation | Mar 10, 2023 | Image to texttext-to-speech | —Unverified | 0 |
| Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports | Mar 9, 2023 | text-to-speechText to Speech | CodeCode Available | 0 |
| Do Prosody Transfer Models Transfer Prosody? | Mar 7, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling | Mar 7, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 5 |
| FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model | Mar 6, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations | Mar 3, 2023 | Speech DenoisingSpeech Enhancement | CodeCode Available | 1 |
| Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding | Mar 2, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities | Mar 2, 2023 | Learning-To-Ranktext-to-speech | —Unverified | 0 |
| LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion | Mar 2, 2023 | Grapheme-to-Phoneme Conversionspeech-recognition | —Unverified | 0 |
| Leveraging Large Text Corpora for End-to-End Speech Summarization | Mar 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations | Mar 1, 2023 | Self-Supervised LearningSpeech Synthesis | —Unverified | 0 |
| DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction | Mar 1, 2023 | Dynamic Time WarpingMetric Learning | —Unverified | 0 |
| ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners | Feb 28, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| UniFLG: Unified Facial Landmark Generator from Text or Speech | Feb 28, 2023 | DecoderFace Generation | —Unverified | 0 |
| Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech | Feb 27, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow | Feb 27, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech | Feb 27, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS | Feb 24, 2023 | Decodertext-to-speech | CodeCode Available | 2 |
| Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition | Feb 20, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages | Feb 13, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech | Feb 8, 2023 | Code GenerationDiversity | CodeCode Available | 2 |
| MAC: A unified framework boosting low resource automatic speech recognition | Feb 5, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| UzbekTagger: The rule-based POS tagger for Uzbek language | Jan 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining | Jan 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker | Jan 29, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| On granularity of prosodic representations in expressive text-to-speech | Jan 26, 2023 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study | Jan 22, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions | Jan 20, 2023 | text-to-speechText to Speech | CodeCode Available | 5 |
| Modelling low-resource accents without accent-specific TTS frontend | Jan 11, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion | Jan 10, 2023 | Quantizationtext-to-speech | —Unverified | 0 |
| Applying Automated Machine Translation to Educational Video Courses | Jan 9, 2023 | Machine TranslationSpeech Synthesis | —Unverified | 0 |
| Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition | Jan 6, 2023 | Domain AdaptationGPU | —Unverified | 0 |
| Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers | Jan 5, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 7 |
| ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration | Jan 1, 2023 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech | Dec 30, 2022 | Denoisingtext-to-speech | CodeCode Available | 1 |
| StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models | Dec 29, 2022 | Data Augmentationtext-to-speech | CodeCode Available | 1 |
| HMM-based data augmentation for E2E systems for building conversational speech synthesis systems | Dec 22, 2022 | Data AugmentationLanguage Modeling | —Unverified | 0 |
| ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement | Dec 21, 2022 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling | Dec 20, 2022 | Formtext-to-speech | —Unverified | 0 |
| TTS-Guided Training for Accent Conversion Without Parallel Data | Dec 20, 2022 | Decodertext-to-speech | —Unverified | 0 |
| Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder | Dec 16, 2022 | Representation LearningSpeech Synthesis | —Unverified | 0 |
| Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language | Dec 16, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Speech Aware Dialog System Technology Challenge (DSTC11) | Dec 16, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |