| Hierarchical Sequence to Sequence Voice Conversion with Limited Data | Jul 15, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention | Jul 9, 2019 | Dialogue GenerationImage Captioning | —Unverified | 0 |
| Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning | Jul 9, 2019 | Speech Synthesistext-to-speech | CodeCode Available | 3 |
| A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach | Jul 5, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples | Jul 4, 2019 | BinarizationGeneral Classification | —Unverified | 0 |
| Fine-grained robust prosody transfer for single-speaker neural text-to-speech | Jul 4, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features | Jul 3, 2019 | Polyphone disambiguationSentence | —Unverified | 0 |
| Attention model for articulatory features detection | Jul 2, 2019 | Manner Of Articulation Detectionmodel | CodeCode Available | 1 |
| An adaptable task-oriented dialog system for stand-alone embedded devices | Jul 1, 2019 | Dialogue ManagementManagement | —Unverified | 0 |
| Improving Performance of End-to-End ASR on Numeric Sequences | Jul 1, 2019 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| RUSLAN: Russian Spoken Language Corpus for Speech Synthesis | Jun 26, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models | Jun 17, 2019 | DecoderSpeech Synthesis | —Unverified | 0 |
| Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling | Jun 17, 2019 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Telephonetic: Making Neural Language Models Robust to ASR and Semantic Noise | Jun 13, 2019 | Data AugmentationDecoder | —Unverified | 0 |
| Using generative modelling to produce varied intonation for speech synthesis | Jun 10, 2019 | SentenceSpeech Synthesis | CodeCode Available | 0 |
| Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods | Jun 7, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| MelNet: A Generative Model for Audio in the Frequency Domain | Jun 4, 2019 | Audio GenerationMusic Generation | CodeCode Available | 0 |
| Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain | Jun 3, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Neural Models of Text Normalization for Speech Applications | Jun 1, 2019 | BIG-bench Machine LearningSpeech Synthesis | —Unverified | 0 |
| Customizing Grapheme-to-Phoneme System for Non-Trivial Transcription Problems in Bangla Language | Jun 1, 2019 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Highly Effective Arabic Diacritization using Sequence to Sequence Modeling | Jun 1, 2019 | Feature EngineeringMachine Translation | —Unverified | 0 |
| Neural Text Normalization with Subword Units | Jun 1, 2019 | Machine TranslationNatural Language Understanding | —Unverified | 0 |
| A Cost Efficient Approach to Correct OCR Errors in Large Document Collections | May 28, 2019 | ClusteringLanguage Modelling | —Unverified | 0 |
| FastSpeech: Fast,Robustand Controllable Text-to-Speech | May 22, 2019 | Decodertext-to-speech | CodeCode Available | 2 |
| FastSpeech: Fast, Robust and Controllable Text to Speech | May 22, 2019 | DecoderSpeech Synthesis | CodeCode Available | 2 |
| Non-Autoregressive Neural Text-to-Speech | May 21, 2019 | text-to-speechText to Speech | CodeCode Available | 0 |
| Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems | May 21, 2019 | parameter estimationSpeech Synthesis | CodeCode Available | 0 |
| CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network | May 17, 2019 | DecoderSentence | —Unverified | 0 |
| Almost Unsupervised Text to Speech and Automatic Speech Recognition | May 13, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text | Apr 30, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| The Zero Resource Speech Challenge 2019: TTS without T | Apr 25, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Expediting TTS Synthesis with Adversarial Vocoding | Apr 16, 2019 | text-to-speechText to Speech | CodeCode Available | 0 |
| End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning | Apr 13, 2019 | Cross-Lingual Transfertext-to-speech | —Unverified | 0 |
| Building a mixed-lingual neural TTS system with only monolingual data | Apr 12, 2019 | Decodertext-to-speech | —Unverified | 0 |
| Direct speech-to-speech translation with a sequence-to-sequence model | Apr 12, 2019 | Speech SynthesisSpeech-to-Speech Translation | CodeCode Available | 0 |
| GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram | Apr 8, 2019 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion | Apr 6, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data | Apr 4, 2019 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Speech denoising by parametric resynthesis | Apr 2, 2019 | DenoisingResynthesis | —Unverified | 0 |
| ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks | Apr 1, 2019 | Feature Engineeringtext-to-speech | CodeCode Available | 0 |
| Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora | Apr 1, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet | Mar 29, 2019 | DecoderSpeech Synthesis | —Unverified | 0 |
| Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis | Mar 27, 2019 | Emotional Speech SynthesisExpressive Speech Synthesis | CodeCode Available | 1 |
| CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages | Mar 27, 2019 | text-to-speechText to Speech | CodeCode Available | 0 |
| Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis | Mar 14, 2019 | Generative Adversarial NetworkSpeech Synthesis | —Unverified | 0 |
| Deep Text-to-Speech System with Seq2Seq Model | Mar 11, 2019 | modelSpeech Synthesis | —Unverified | 0 |
| Data Efficient Voice Cloning for Neural Singing Synthesis | Feb 19, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model | Feb 18, 2019 | Retrievaltext-to-speech | CodeCode Available | 1 |
| Unsupervised Polyglot Text To Speech | Feb 6, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Hand Sign to Bangla Speech: A Deep Learning in Vision based system for Recognizing Hand Sign Digits and Generating Bangla Speech | Jan 17, 2019 | Gesture Recognitiontext-to-speech | —Unverified | 0 |