| Attention Forcing for Machine Translation | Apr 2, 2021 | Machine TranslationNMT | CodeCode Available | 0 |
| Expressive Text-to-Speech using Style Tag | Apr 1, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling | Apr 1, 2021 | Decodertext-to-speech | —Unverified | 0 |
| Fast DCTTS: Efficient Deep Convolutional Text-to-Speech | Apr 1, 2021 | Computational EfficiencyCPU | —Unverified | 0 |
| Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training | Mar 31, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Continual Speaker Adaptation for Text-to-Speech Synthesis | Mar 26, 2021 | Continual LearningDiversity | —Unverified | 0 |
| STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech | Mar 17, 2021 | Speech SynthesisStyle Transfer | —Unverified | 0 |
| GAN Vocoder: Multi-Resolution Discriminator Is All You Need | Mar 9, 2021 | Alltext-to-speech | —Unverified | 0 |
| Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech | Mar 6, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music | Mar 4, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| AdaSpeech: Adaptive Text to Speech for Custom Voice | Mar 1, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Model architectures to extrapolate emotional expressions in DNN-based text-to-speech | Feb 20, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input | Feb 19, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AudioVisual Speech Synthesis: A brief literature review | Feb 18, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention | Feb 12, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning | Feb 10, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search | Feb 8, 2021 | CPUModel Compression | CodeCode Available | 1 |
| Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram | Feb 3, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Expressive Neural Voice Cloning | Jan 30, 2021 | Speech SynthesisStyle Transfer | —Unverified | 0 |
| Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet | Jan 30, 2021 | CPUSentence | —Unverified | 0 |
| EmoCat: Language-agnostic Emotional Voice Conversion | Jan 14, 2021 | Decodertext-to-speech | —Unverified | 0 |
| Generating coherent spontaneous speech and gesture from text | Jan 14, 2021 | Gesture GenerationMotion Generation | —Unverified | 0 |
| Whispered and Lombard Neural Speech Synthesis | Jan 13, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Joint Audio-Visual Deepfake Detection | Jan 1, 2021 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech | Jan 1, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Unified Mandarin TTS Front-end Based on Distilled BERT Model | Dec 31, 2020 | Knowledge DistillationLanguage Modeling | CodeCode Available | 1 |
| Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention | Dec 29, 2020 | Data Augmentationtext-to-speech | —Unverified | 0 |
| Denoising Text to Speech with Frame-Level Noise Modeling | Dec 17, 2020 | Denoisingtext-to-speech | —Unverified | 0 |
| Parallel WaveNet conditioned on VAE latent vectors | Dec 17, 2020 | SentenceSpeech Synthesis | —Unverified | 0 |
| Syntactic representation learning for neural network based TTS with syntactic parse tree traversal | Dec 13, 2020 | DiversityRepresentation Learning | —Unverified | 0 |
| Using previous acoustic context to improve Text-to-Speech synthesis | Dec 7, 2020 | DecoderSpeech Synthesis | —Unverified | 0 |
| MLS: A Large-Scale Multilingual Dataset for Speech Research | Dec 7, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment | Dec 4, 2020 | Meta-Learningtext-to-speech | CodeCode Available | 0 |
| GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis | Dec 3, 2020 | DecoderGraph Embedding | —Unverified | 0 |
| Text-to-speech for the hearing impaired | Dec 3, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Vietnamese Text-To-Speech Shared Task VLSP 2020: Remaining problems with state-of-the-art techniques | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Development of Smartcall Vietnamese Text-to-Speech for VLSP 2020 | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Improving prosodic phrasing of Vietnamese text-to-speech systems | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities | Dec 1, 2020 | Chinese Word SegmentationSpeech Synthesis | CodeCode Available | 1 |
| FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge | Nov 25, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio | Nov 25, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech | Nov 24, 2020 | Data AugmentationSpeaker Recognition | —Unverified | 0 |
| Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems | Nov 23, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder | Nov 20, 2020 | Model CompressionQuantization | CodeCode Available | 0 |
| Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains | Nov 19, 2020 | text-to-speechText to Speech | CodeCode Available | 1 |
| Deep Shallow Fusion for RNN-T Personalization | Nov 16, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis | Nov 12, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement | Nov 12, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Low-resource expressive text-to-speech using data augmentation | Nov 11, 2020 | Data Augmentationtext-to-speech | —Unverified | 0 |
| Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS | Nov 10, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |