| Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech | Nov 7, 2021 | Meta-LearningSpeech Synthesis | CodeCode Available | 1 |
| FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection | Oct 18, 2021 | Speech SynthesisSynthetic Speech Detection | CodeCode Available | 1 |
| Fine-grained style control in Transformer-based Text-to-speech Synthesis | Oct 12, 2021 | Inductive BiasSpeech Synthesis | CodeCode Available | 1 |
| Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech | Oct 8, 2021 | Emotion InterpretationExpressive Speech Synthesis | CodeCode Available | 1 |
| Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings | Oct 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 |
| Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration | Sep 12, 2021 | Decodertext-to-speech | CodeCode Available | 1 |
| UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021 | Jul 26, 2021 | Audio CompressionFace Swapping | CodeCode Available | 1 |
| StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion | Jul 21, 2021 | Generative Adversarial Networktext-to-speech | CodeCode Available | 1 |
| EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion | Jul 4, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| A Survey on Neural Speech Synthesis | Jun 29, 2021 | Speech SynthesisSurvey | CodeCode Available | 1 |
| WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis | Jun 17, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis | Jun 15, 2021 | speech-recognitionSpeech Recognition | CodeCode Available | 1 |
| Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling | Jun 11, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| HUI-Audio-Corpus-German: A high quality TTS dataset | Jun 11, 2021 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation | Jun 6, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech | May 13, 2021 | DecoderSpeech Synthesis | CodeCode Available | 1 |
| Wav2KWS: Transfer Learning from Speech Representations for Keyword Spotting | May 10, 2021 | Keyword Spottingtext-to-speech | CodeCode Available | 1 |
| Deep Learning Based Assessment of Synthetic Speech Naturalness | Apr 23, 2021 | Deep LearningPrediction | CodeCode Available | 1 |
| AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data | Apr 20, 2021 | Decodertext-to-speech | CodeCode Available | 1 |
| KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset | Apr 17, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction | Apr 16, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems | Apr 15, 2021 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| A Toolbox for Construction and Analysis of Speech Datasets | Apr 11, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model | Apr 2, 2021 | Decodertext-to-speech | CodeCode Available | 1 |
| Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training | Mar 31, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| AdaSpeech: Adaptive Text to Speech for Custom Voice | Mar 1, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search | Feb 8, 2021 | CPUModel Compression | CodeCode Available | 1 |
| Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech | Jan 1, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Unified Mandarin TTS Front-end Based on Distilled BERT Model | Dec 31, 2020 | Knowledge DistillationLanguage Modeling | CodeCode Available | 1 |
| Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities | Dec 1, 2020 | Chinese Word SegmentationSpeech Synthesis | CodeCode Available | 1 |
| Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains | Nov 19, 2020 | text-to-speechText to Speech | CodeCode Available | 1 |
| Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis | Nov 6, 2020 | DecoderSpeech Synthesis | CodeCode Available | 1 |
| Semi-supervised URL Segmentation with Recurrent Neural NetworksPre-trained on Knowledge Graph Entities | Nov 5, 2020 | Chinese Word SegmentationSpeech Synthesis | CodeCode Available | 1 |
| StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization | Nov 3, 2020 | Spectral Reconstructiontext-to-speech | CodeCode Available | 1 |
| IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine Translation | Nov 1, 2020 | Dynamic Time WarpingMachine Translation | CodeCode Available | 1 |
| Effective Deep Learning Models for Automatic Diacritization of Arabic Text | Nov 1, 2020 | Arabic Text DiacritizationDecoder | CodeCode Available | 1 |
| One-class learning towards generalized voice spoofing detection | Oct 27, 2020 | Speaker Verificationtext-to-speech | CodeCode Available | 1 |
| Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview | Oct 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling | Oct 8, 2020 | Speech Recognitiontext-to-speech | CodeCode Available | 1 |
| Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries | Sep 21, 2020 | Sentencetext-to-speech | CodeCode Available | 1 |
| Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion | Aug 13, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding | Aug 12, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions | Aug 9, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Pretraining Techniques for Sequence-to-Sequence Voice Conversion | Aug 7, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Phonological Features for 0-shot Multilingual Speech Synthesis | Aug 6, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech | Aug 3, 2020 | Meta-LearningSpeech Synthesis | CodeCode Available | 1 |
| FastPitch: Parallel Text-to-speech with Pitch Prediction | Jun 11, 2020 | Predictiontext-to-speech | CodeCode Available | 1 |
| FastSpeech 2: Fast and High-Quality End-to-End Text to Speech | Jun 8, 2020 | Knowledge DistillationSpeech Synthesis | CodeCode Available | 1 |