| Diff-TTS: A Denoising Diffusion Model for Text-to-Speech | Apr 3, 2021 | DenoisingGPU | —Unverified | 0 |
| Hi-Fi Multi-Speaker English TTS Dataset | Apr 3, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Attention Forcing for Machine Translation | Apr 2, 2021 | Machine TranslationNMT | CodeCode Available | 0 |
| Fast DCTTS: Efficient Deep Convolutional Text-to-Speech | Apr 1, 2021 | Computational EfficiencyCPU | —Unverified | 0 |
| Expressive Text-to-Speech using Style Tag | Apr 1, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling | Apr 1, 2021 | Decodertext-to-speech | —Unverified | 0 |
| Continual Speaker Adaptation for Text-to-Speech Synthesis | Mar 26, 2021 | Continual LearningDiversity | —Unverified | 0 |
| STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech | Mar 17, 2021 | Speech SynthesisStyle Transfer | —Unverified | 0 |
| GAN Vocoder: Multi-Resolution Discriminator Is All You Need | Mar 9, 2021 | Alltext-to-speech | —Unverified | 0 |
| Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech | Mar 6, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music | Mar 4, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Model architectures to extrapolate emotional expressions in DNN-based text-to-speech | Feb 20, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input | Feb 19, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AudioVisual Speech Synthesis: A brief literature review | Feb 18, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention | Feb 12, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning | Feb 10, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram | Feb 3, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet | Jan 30, 2021 | CPUSentence | —Unverified | 0 |
| Expressive Neural Voice Cloning | Jan 30, 2021 | Speech SynthesisStyle Transfer | —Unverified | 0 |
| EmoCat: Language-agnostic Emotional Voice Conversion | Jan 14, 2021 | Decodertext-to-speech | —Unverified | 0 |
| Generating coherent spontaneous speech and gesture from text | Jan 14, 2021 | Gesture GenerationMotion Generation | —Unverified | 0 |
| Whispered and Lombard Neural Speech Synthesis | Jan 13, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Joint Audio-Visual Deepfake Detection | Jan 1, 2021 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention | Dec 29, 2020 | Data Augmentationtext-to-speech | —Unverified | 0 |
| Parallel WaveNet conditioned on VAE latent vectors | Dec 17, 2020 | SentenceSpeech Synthesis | —Unverified | 0 |
| Denoising Text to Speech with Frame-Level Noise Modeling | Dec 17, 2020 | Denoisingtext-to-speech | —Unverified | 0 |
| Syntactic representation learning for neural network based TTS with syntactic parse tree traversal | Dec 13, 2020 | DiversityRepresentation Learning | —Unverified | 0 |
| Using previous acoustic context to improve Text-to-Speech synthesis | Dec 7, 2020 | DecoderSpeech Synthesis | —Unverified | 0 |
| MLS: A Large-Scale Multilingual Dataset for Speech Research | Dec 7, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment | Dec 4, 2020 | Meta-Learningtext-to-speech | CodeCode Available | 0 |
| Text-to-speech for the hearing impaired | Dec 3, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis | Dec 3, 2020 | DecoderGraph Embedding | —Unverified | 0 |
| Vietnamese Text-To-Speech Shared Task VLSP 2020: Remaining problems with state-of-the-art techniques | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Improving prosodic phrasing of Vietnamese text-to-speech systems | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Development of Smartcall Vietnamese Text-to-Speech for VLSP 2020 | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio | Nov 25, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge | Nov 25, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech | Nov 24, 2020 | Data AugmentationSpeaker Recognition | —Unverified | 0 |
| Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems | Nov 23, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder | Nov 20, 2020 | Model CompressionQuantization | CodeCode Available | 0 |
| Deep Shallow Fusion for RNN-T Personalization | Nov 16, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis | Nov 12, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement | Nov 12, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Low-resource expressive text-to-speech using data augmentation | Nov 11, 2020 | Data Augmentationtext-to-speech | —Unverified | 0 |
| Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS | Nov 10, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement | Nov 8, 2020 | DisentanglementSpeech Synthesis | —Unverified | 0 |
| Naturalization of Text by the Insertion of Pauses and Filler Words | Nov 7, 2020 | Sentencetext-to-speech | CodeCode Available | 0 |
| Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis | Nov 6, 2020 | DecoderSentence | —Unverified | 0 |
| Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech | Nov 4, 2020 | Graph AttentionRepresentation Learning | —Unverified | 0 |
| Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time | Nov 4, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |