| Exploring TTS without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020) | May 11, 2020 | Clusteringspeech-recognition | CodeCode Available | 0 |
| Luganda Text-to-Speech Machine | May 11, 2020 | text-to-speechText to Speech | CodeCode Available | 0 |
| IndicSpeech: Text-to-Speech Corpus for Indian Languages | May 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Corpus Generation for Voice Command in Smart Home and the Effect of Speech Synthesis on End-to-End SLU | May 1, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Neural Text-to-Speech Synthesis for an Under-Resourced Language in a Diglossic Environment: the Case of Gascon Occitan | May 1, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech | May 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech | May 1, 2020 | Text Normalizationtext-to-speech | —Unverified | 0 |
| Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems | May 1, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Development and Evaluation of Speech Synthesis Corpora for Latvian | May 1, 2020 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Open-Source High Quality Speech Datasets for Basque, Catalan and Galician | May 1, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Style Variation as a Vantage Point for Code-Switching | May 1, 2020 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech | Apr 30, 2020 | Rhythmtext-to-speech | —Unverified | 0 |
| A Study of Non-autoregressive Model for Sequence Generation | Apr 22, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ESPnet-ST: All-in-One Speech Translation Toolkit | Apr 21, 2020 | AllAutomatic Speech Recognition | —Unverified | 0 |
| Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System | Apr 20, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Scalable Multilingual Frontend for TTS | Apr 10, 2020 | ChunkingMachine Translation | —Unverified | 0 |
| Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data | Apr 10, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Improving Readability for Automatic Speech Recognition Transcription | Apr 9, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Statistical Context-Dependent Units Boundary Correction for Corpus-based Unit-Selection Text-to-Speech | Mar 5, 2020 | Segmentationtext-to-speech | —Unverified | 0 |
| GraphTTS: graph-to-sequence modelling in neural text-to-speech | Mar 4, 2020 | Graph EmbeddingGraph-to-Sequence | —Unverified | 0 |
| AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment | Mar 4, 2020 | text-to-speechText to Speech | CodeCode Available | 0 |
| Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis | Feb 28, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| On the Discrepancy between Density Estimation and Sequence Generation | Feb 17, 2020 | Density EstimationMachine Translation | —Unverified | 0 |
| Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis | Feb 6, 2020 | DisentanglementSpeech Synthesis | —Unverified | 0 |
| Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior | Feb 6, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization | Feb 4, 2020 | Bayesian Optimizationtext-to-speech | —Unverified | 0 |
| WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss | Feb 2, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network | Jan 31, 2020 | QuantizationSpeech Synthesis | —Unverified | 0 |
| From Speech-to-Speech Translation to Automatic Dubbing | Jan 19, 2020 | Machine TranslationSpeech-to-Speech Translation | —Unverified | 0 |
| Smart Summarizer for Blind People | Jan 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Parallel Neural Text-to-Speech | Jan 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems | Dec 19, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Singing Synthesis: with a little help from my attention | Dec 12, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Neural Voice Puppetry: Audio-driven Facial Reenactment | Dec 11, 2019 | Face ModelNeural Rendering | CodeCode Available | 0 |
| Semantic Mask for Transformer based End-to-End Speech Recognition | Dec 6, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Towards Robust Neural Vocoding for Speech Generation: A Survey | Dec 5, 2019 | Speech SynthesisSurvey | —Unverified | 0 |
| Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection | Dec 2, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech | Nov 28, 2019 | DisentanglementExpressive Speech Synthesis | —Unverified | 0 |
| Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers | Nov 26, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features | Nov 21, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Independent and automatic evaluation of acoustic-to-articulatory inversion models | Nov 15, 2019 | speech-recognitionSpeech Recognition | CodeCode Available | 0 |
| A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis | Nov 11, 2019 | Polyphone disambiguationSpeech Synthesis | —Unverified | 0 |
| Emotional Voice Conversion using Multitask Learning with Text-to-speech | Nov 11, 2019 | Decodertext-to-speech | CodeCode Available | 0 |
| Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework | Nov 7, 2019 | SentenceSpeech Synthesis | —Unverified | 0 |
| Teacher-Student Training for Robust Tacotron-based TTS | Nov 7, 2019 | DecoderKnowledge Distillation | —Unverified | 0 |
| A System for Diacritizing Four Varieties of Arabic | Nov 1, 2019 | Feature Engineeringtext-to-speech | —Unverified | 0 |
| Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis | Oct 29, 2019 | Speaker VerificationSpeech Synthesis | CodeCode Available | 0 |
| Unsupervised pre-training for sequence to sequence speech recognition | Oct 28, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment | Oct 28, 2019 | Hard AttentionSpeech Synthesis | —Unverified | 0 |
| Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency | Oct 25, 2019 | Emotion ClassificationStyle Transfer | —Unverified | 0 |