| Speech-T: Transducer for Text to Speech and Beyond | Dec 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Generating Rich Product Descriptions for Conversational E-commerce Systems | Nov 30, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance | Nov 23, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis | Nov 19, 2021 | ClusteringDecoder | —Unverified | 0 |
| Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control | Nov 19, 2021 | ClusteringData Augmentation | —Unverified | 0 |
| Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages | Nov 19, 2021 | Data Augmentationspeech-recognition | —Unverified | 0 |
| High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency | Nov 17, 2021 | CPUDecoder | —Unverified | 0 |
| Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech | Nov 16, 2021 | Diversitytext-to-speech | —Unverified | 0 |
| Speech Synthesis for Low Resource Languages using Transliteration Enabled Transfer Learning | Nov 16, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning | Nov 14, 2021 | DisentanglementMeta-Learning | —Unverified | 0 |
| Speaker Generation | Nov 7, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Emotional Prosody Control for Speech Generation | Nov 7, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit | Nov 1, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Controlling Prosody in End-to-End TTS: A Case Study on Contrastive Focus Generation | Nov 1, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| ViDA-MAN: Visual Dialog with Digital Humans | Oct 26, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021 | Oct 25, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech | Oct 24, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| ESPnet2-TTS: Extending the Edge of TTS Research | Oct 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Neural Dubber: Dubbing for Videos According to Scripts | Oct 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation | Oct 15, 2021 | Data AugmentationSimultaneous Speech-to-Speech Translation | —Unverified | 0 |
| Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech | Oct 14, 2021 | Disentanglementtext-to-speech | —Unverified | 0 |
| Revisiting IPA-based Cross-lingual Text-to-speech | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation | Oct 14, 2021 | Generative Adversarial NetworkGPU | —Unverified | 0 |
| FedSpeech: Federated Text-to-Speech with Continual Learning | Oct 14, 2021 | Continual LearningFederated Learning | —Unverified | 0 |
| Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| A Melody-Unsupervision Model for Singing Voice Synthesis | Oct 13, 2021 | modelSinging Voice Synthesis | —Unverified | 0 |
| Systematic Inequalities in Language Technology Performance across the World's Languages | Oct 13, 2021 | Dependency ParsingMachine Translation | CodeCode Available | 0 |
| Adapting TTS models For New Speakers using Transfer Learning | Oct 12, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis | Oct 9, 2021 | Lifelong learningSpeech Synthesis | CodeCode Available | 0 |
| A study on the efficacy of model pre-training in developing neural text-to-speech system | Oct 8, 2021 | Computational Efficiencytext-to-speech | —Unverified | 0 |
| Environment Aware Text-to-Speech Synthesis | Oct 8, 2021 | AttributeDisentanglement | —Unverified | 0 |
| Applying Phonological Features in Multilingual Text-To-Speech | Oct 7, 2021 | Language Acquisitiontext-to-speech | CodeCode Available | 0 |
| VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over | Oct 7, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Prosody-TTS: An end-to-end speech synthesis system with prosody control | Oct 6, 2021 | RhythmSpeech Synthesis | —Unverified | 0 |
| Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS | Oct 6, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| GANtron: Emotional Speech Synthesis with Generative Adversarial Networks | Oct 6, 2021 | Emotional Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models | Oct 6, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Emphasis control for parallel neural TTS | Oct 6, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis | Oct 4, 2021 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 |
| Neural Speech Synthesis in German | Oct 3, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system | Oct 1, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Conditioning Sequence-to-sequence Networks with Learned Activations | Sep 29, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Guided-TTS:Text-to-Speech with Untranscribed Speech | Sep 29, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis | Sep 27, 2021 | Density EstimationSpeech Synthesis | —Unverified | 0 |
| A Proposal of Automatic Error Correction in Text | Sep 24, 2021 | Information RetrievalLanguage Modelling | —Unverified | 0 |
| Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network | Sep 22, 2021 | Knowledge DistillationLanguage Modeling | —Unverified | 0 |
| On-device neural speech synthesis | Sep 17, 2021 | GPUSpeech Synthesis | —Unverified | 0 |
| fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit | Sep 14, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis | Sep 8, 2021 | Expressive Speech SynthesisSentence | —Unverified | 0 |
| Integrated Speech and Gesture Synthesis | Aug 25, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |