| GraphTTS: graph-to-sequence modelling in neural text-to-speech | Mar 4, 2020 | Graph EmbeddingGraph-to-Sequence | —Unverified | 0 |
| Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis | Feb 28, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Semi-Supervised Neural Architecture Search | Feb 24, 2020 | GPUNatural Language Transduction | CodeCode Available | 1 |
| On the Discrepancy between Density Estimation and Sequence Generation | Feb 17, 2020 | Density EstimationMachine Translation | —Unverified | 0 |
| Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis | Feb 6, 2020 | DisentanglementSpeech Synthesis | —Unverified | 0 |
| Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior | Feb 6, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization | Feb 4, 2020 | Bayesian Optimizationtext-to-speech | —Unverified | 0 |
| WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss | Feb 2, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network | Jan 31, 2020 | QuantizationSpeech Synthesis | —Unverified | 0 |
| From Speech-to-Speech Translation to Automatic Dubbing | Jan 19, 2020 | Machine TranslationSpeech-to-Speech Translation | —Unverified | 0 |
| Smart Summarizer for Blind People | Jan 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Parallel Neural Text-to-Speech | Jan 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems | Dec 19, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining | Dec 14, 2019 | text-to-speechText to Speech | CodeCode Available | 1 |
| Singing Synthesis: with a little help from my attention | Dec 12, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Neural Voice Puppetry: Audio-driven Facial Reenactment | Dec 11, 2019 | Face ModelNeural Rendering | CodeCode Available | 0 |
| Semantic Mask for Transformer based End-to-End Speech Recognition | Dec 6, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Towards Robust Neural Vocoding for Speech Generation: A Survey | Dec 5, 2019 | Speech SynthesisSurvey | —Unverified | 0 |
| Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection | Dec 2, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech | Nov 28, 2019 | DisentanglementExpressive Speech Synthesis | —Unverified | 0 |
| Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers | Nov 26, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features | Nov 21, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Independent and automatic evaluation of acoustic-to-articulatory inversion models | Nov 15, 2019 | speech-recognitionSpeech Recognition | CodeCode Available | 0 |
| Emotional Voice Conversion using Multitask Learning with Text-to-speech | Nov 11, 2019 | Decodertext-to-speech | CodeCode Available | 0 |
| A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis | Nov 11, 2019 | Polyphone disambiguationSpeech Synthesis | —Unverified | 0 |
| Teacher-Student Training for Robust Tacotron-based TTS | Nov 7, 2019 | DecoderKnowledge Distillation | —Unverified | 0 |
| Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework | Nov 7, 2019 | SentenceSpeech Synthesis | —Unverified | 0 |
| A System for Diacritizing Four Varieties of Arabic | Nov 1, 2019 | Feature Engineeringtext-to-speech | —Unverified | 0 |
| Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis | Oct 29, 2019 | Speaker VerificationSpeech Synthesis | CodeCode Available | 0 |
| Unsupervised pre-training for sequence to sequence speech recognition | Oct 28, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment | Oct 28, 2019 | Hard AttentionSpeech Synthesis | —Unverified | 0 |
| Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency | Oct 25, 2019 | Emotion ClassificationStyle Transfer | —Unverified | 0 |
| Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram | Oct 25, 2019 | Generative Adversarial NetworkGPU | CodeCode Available | 2 |
| ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit | Oct 24, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis | Oct 23, 2019 | FormSpeech Synthesis | CodeCode Available | 0 |
| G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR | Oct 22, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach | Oct 14, 2019 | Expressive Speech SynthesisSociology | —Unverified | 0 |
| Semi-Supervised Generative Modeling for Controllable Speech Synthesis | Oct 3, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| High Fidelity Speech Synthesis with Adversarial Networks | Sep 25, 2019 | Generative Adversarial NetworkSpeech Synthesis | CodeCode Available | 0 |
| Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech | Sep 14, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| A Comparative Study on Transformer vs RNN in Speech Applications | Sep 13, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Modular Meta-Learning with Shrinkage | Sep 12, 2019 | Image ClassificationMeta-Learning | —Unverified | 0 |
| Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs | Sep 9, 2019 | FormSpeech Synthesis | —Unverified | 0 |
| Neural Network-Based Modeling of Phonetic Durations | Sep 6, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Large-Scale User Study of an Alexa Prize Chatbot: Effect of TTS Dynamism on Perceived Quality of Social Dialog | Sep 1, 2019 | Chatbottext-to-speech | —Unverified | 0 |
| Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments | Aug 30, 2019 | Decodertext-to-speech | —Unverified | 0 |
| Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis | Aug 27, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories | Aug 20, 2019 | RetrievalTAG | —Unverified | 0 |
| Numbers Normalisation in the Inflected Languages: a Case Study of Polish | Aug 1, 2019 | text-to-speechText to Speech | CodeCode Available | 0 |
| MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible | Jul 30, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |