| ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow | Feb 27, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech | Feb 27, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition | Feb 20, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages | Feb 13, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| MAC: A unified framework boosting low resource automatic speech recognition | Feb 5, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| UzbekTagger: The rule-based POS tagger for Uzbek language | Jan 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker | Jan 29, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| On granularity of prosodic representations in expressive text-to-speech | Jan 26, 2023 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study | Jan 22, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Modelling low-resource accents without accent-specific TTS frontend | Jan 11, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion | Jan 10, 2023 | Quantizationtext-to-speech | —Unverified | 0 |
| Applying Automated Machine Translation to Educational Video Courses | Jan 9, 2023 | Machine TranslationSpeech Synthesis | —Unverified | 0 |
| Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition | Jan 6, 2023 | Domain AdaptationGPU | —Unverified | 0 |
| ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration | Jan 1, 2023 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| HMM-based data augmentation for E2E systems for building conversational speech synthesis systems | Dec 22, 2022 | Data AugmentationLanguage Modeling | —Unverified | 0 |
| ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement | Dec 21, 2022 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling | Dec 20, 2022 | Formtext-to-speech | —Unverified | 0 |
| TTS-Guided Training for Accent Conversion Without Parallel Data | Dec 20, 2022 | Decodertext-to-speech | —Unverified | 0 |
| Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language | Dec 16, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder | Dec 16, 2022 | Representation LearningSpeech Synthesis | —Unverified | 0 |
| Speech Aware Dialog System Technology Challenge (DSTC11) | Dec 16, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Probing Deep Speaker Embeddings for Speaker-related Tasks | Dec 14, 2022 | Speaker RecognitionSpeaker Verification | —Unverified | 0 |
| Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue | Dec 7, 2022 | Spoken Dialogue Systemstext-to-speech | —Unverified | 0 |
| Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning | Dec 7, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech | Nov 30, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Controllable speech synthesis by learning discrete phoneme-level prosodic representations | Nov 29, 2022 | ClusteringSpeech Synthesis | —Unverified | 0 |
| Evaluating and reducing the distance between synthetic and real speech distributions | Nov 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Contextual Expressive Text-to-Speech | Nov 26, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Efficient Incremental Text-to-Speech on GPUs | Nov 25, 2022 | GPUSpeech Synthesis | —Unverified | 0 |
| IMaSC -- ICFOSS Malayalam Speech Corpus | Nov 23, 2022 | Sentencetext-to-speech | —Unverified | 0 |
| PromptTTS: Controllable Text-to-Speech with Text Descriptions | Nov 22, 2022 | DecoderSpeech Synthesis | CodeCode Available | 0 |
| EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance | Nov 17, 2022 | Denoisingtext-to-speech | —Unverified | 0 |
| Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models | Nov 17, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation | Nov 17, 2022 | Data AugmentationMachine Translation | —Unverified | 0 |
| SNIPER Training: Single-Shot Sparse Training for Text-to-Speech | Nov 14, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations | Nov 11, 2022 | Emotional Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space | Nov 6, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Parallel Attention Forcing for Machine Translation | Nov 6, 2022 | Machine TranslationNMT | —Unverified | 0 |
| Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech | Nov 4, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features | Nov 1, 2022 | POSPrediction | —Unverified | 0 |
| Generating Multilingual Gender-Ambiguous Text-to-Speech Voices | Nov 1, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages | Nov 1, 2022 | ChunkingRhythm | —Unverified | 0 |
| Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers | Nov 1, 2022 | parameter-efficient fine-tuningSpeech Synthesis | —Unverified | 0 |
| Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation | Oct 31, 2022 | DecoderDisentanglement | —Unverified | 0 |
| Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection | Oct 31, 2022 | Audio CompressionFace Swapping | —Unverified | 0 |
| Structured State Space Decoder for Speech Recognition and Synthesis | Oct 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders | Oct 28, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis | Oct 28, 2022 | DecoderDiversity | —Unverified | 0 |
| Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation | Oct 28, 2022 | text-to-speechText to Speech | —Unverified | 0 |