| A Survey on Neural Speech Synthesis | Jun 29, 2021 | Speech SynthesisSurvey | CodeCode Available | 1 |
| FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech | Jun 24, 2021 | Generative Adversarial Networktext-to-speech | —Unverified | 0 |
| Non-native English lexicon creation for bilingual speech synthesis | Jun 21, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters | Jun 19, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis | Jun 17, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model | Jun 17, 2021 | Emotional Speech SynthesisEmotion Classification | —Unverified | 0 |
| Improving the expressiveness of neural vocoding with non-affine Normalizing Flows | Jun 16, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| ADEPT: A Dataset for Evaluating Prosody Transfer | Jun 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis | Jun 15, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis | Jun 15, 2021 | speech-recognitionSpeech Recognition | CodeCode Available | 1 |
| UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation | Jun 15, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 3 |
| A learned conditional prior for the VAE acoustic space of a TTS system | Jun 14, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| SynthASR: Unlocking Synthetic Data for Speech Recognition | Jun 14, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| HUI-Audio-Corpus-German: A high quality TTS dataset | Jun 11, 2021 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling | Jun 11, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows | Jun 10, 2021 | DisentanglementSentence | —Unverified | 0 |
| Speech BERT Embedding For Improving Prosody in Neural TTS | Jun 8, 2021 | Decodertext-to-speech | —Unverified | 0 |
| Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios | Jun 7, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation | Jun 6, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech | Jun 5, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis | Jun 3, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis | Jun 3, 2021 | Data AugmentationSpeaker Verification | —Unverified | 0 |
| Dual Script E2E framework for Multilingual and Code-Switching ASR | Jun 2, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Corpus of Neutral Voice Speech in Brazilian Portuguese | May 21, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech | May 13, 2021 | DecoderSpeech Synthesis | CodeCode Available | 1 |
| Wav2KWS: Transfer Learning from Speech Representations for Keyword Spotting | May 10, 2021 | Keyword Spottingtext-to-speech | CodeCode Available | 1 |
| Learning Robust Latent Representations for Controllable Speech Synthesis | May 10, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism | May 6, 2021 | Generative Adversarial NetworkSinging Voice Synthesis | CodeCode Available | 2 |
| Talrómur: A large Icelandic TTS corpus | May 1, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| On Addressing Practical Challenges for RNN-Transducer | Apr 27, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis | Apr 26, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Deep Learning Based Assessment of Synthetic Speech Naturalness | Apr 23, 2021 | Deep LearningPrediction | CodeCode Available | 1 |
| AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data | Apr 20, 2021 | Decodertext-to-speech | CodeCode Available | 1 |
| KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset | Apr 17, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction | Apr 16, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems | Apr 15, 2021 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| Non-autoregressive sequence-to-sequence voice conversion | Apr 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis | Apr 14, 2021 | Dependency ParsingRepresentation Learning | —Unverified | 0 |
| Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures | Apr 12, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Toolbox for Construction and Analysis of Speech Datasets | Apr 11, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation | Apr 8, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features | Apr 8, 2021 | DecoderSpeech Synthesis | —Unverified | 0 |
| Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects | Apr 8, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| AI4D -- African Language Program | Apr 6, 2021 | Machine Translationspeech-recognition | CodeCode Available | 0 |
| Hi-Fi Multi-Speaker English TTS Dataset | Apr 3, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability | Apr 3, 2021 | Emotion Recognitionreinforcement-learning | —Unverified | 0 |
| Diff-TTS: A Denoising Diffusion Model for Text-to-Speech | Apr 3, 2021 | DenoisingGPU | —Unverified | 0 |
| SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model | Apr 2, 2021 | Decodertext-to-speech | CodeCode Available | 1 |