| Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework | Nov 4, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Training Wake Word Detection with Synthesized Speech Data on Confusion Words | Nov 3, 2020 | Data AugmentationKeyword Spotting | —Unverified | 0 |
| Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech | Nov 2, 2020 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 |
| Learning from Explanations and Demonstrations: A Pilot Study | Nov 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech | Oct 29, 2020 | Decodertext-to-speech | —Unverified | 0 |
| Effective Decoder Masking for Transformer Based End-to-End Speech Recognition | Oct 27, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators | Oct 27, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition | Oct 26, 2020 | Emotion RecognitionSpeech Emotion Recognition | —Unverified | 0 |
| GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis | Oct 23, 2020 | Graph AttentionGraph Neural Network | —Unverified | 0 |
| The NTU-AISG Text-to-speech System for Blizzard Challenge 2020 | Oct 22, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| NU-GAN: High resolution neural upsampling with GAN | Oct 22, 2020 | Audio GenerationSpeech Synthesis | —Unverified | 0 |
| Learning Speaker Embedding from Text-to-Speech | Oct 21, 2020 | ClassificationDecoder | CodeCode Available | 0 |
| A Mask-based Model for Mandarin Chinese Polyphone Disambiguation | Oct 21, 2020 | Polyphone disambiguationtext-to-speech | —Unverified | 0 |
| An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems | Oct 21, 2020 | Grapheme-to-Phoneme ConversionRelation | —Unverified | 0 |
| Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction | Oct 20, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End-to-End Text-to-Speech using Latent Duration based on VQ-VAE | Oct 19, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion | Oct 16, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Improving Low Resource Code-switched ASR using Augmented Code-switched TTS | Oct 12, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Latent linguistic embedding for cross-lingual text-to-speech and voice conversion | Oct 8, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems | Oct 8, 2020 | Data Augmentationintent-classification | —Unverified | 0 |
| Neural Speech Synthesis for Estonian | Oct 6, 2020 | SentenceSpeech Synthesis | —Unverified | 0 |
| The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS | Oct 6, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| JSSS: free Japanese speech corpus for summarization and simplification | Oct 5, 2020 | FormSpeech Synthesis | CodeCode Available | 0 |
| Compress Polyphone Pronunciation Prediction Model with Shared Labels | Oct 1, 2020 | PredictionQuantization | —Unverified | 0 |
| Automatic Arabic Dialect Identification Systems for Written Texts: A Survey | Sep 26, 2020 | Dialect IdentificationMachine Translation | —Unverified | 0 |
| Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis | Sep 17, 2020 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Controllable neural text-to-speech synthesis using intuitive prosodic features | Sep 14, 2020 | SentenceSpeech Synthesis | —Unverified | 0 |
| What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS | Sep 4, 2020 | DecoderSentence | —Unverified | 0 |
| Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer | Sep 3, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Textual Echo Cancellation | Aug 13, 2020 | Acoustic echo cancellationspeech-recognition | —Unverified | 0 |
| Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages | Aug 11, 2020 | Quantizationtext-to-speech | —Unverified | 0 |
| Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems | Aug 11, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition | Aug 9, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes | Aug 7, 2020 | Gaussian ProcessesSpeech Synthesis | —Unverified | 0 |
| Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning | Aug 7, 2020 | Audio Generationreinforcement-learning | —Unverified | 0 |
| Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability | Jul 30, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture | Jul 22, 2020 | RhythmSpeech Synthesis | —Unverified | 0 |
| Normalizing Text using Language Modelling based on Phonetics and String Similarity | Jun 25, 2020 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Generic Indic Text-to-speech Synthesisers with Rapid Adaptation in an End-to-end Framework | Jun 12, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning | Jun 5, 2020 | Self-Supervised LearningSpeaker Verification | CodeCode Available | 0 |
| NAUTILUS: a Versatile Voice Cloning System | May 22, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario | May 21, 2020 | AttributeSpeech Synthesis | —Unverified | 0 |
| Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis | May 20, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech | May 19, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders | May 18, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation | May 16, 2020 | DecoderSpeech Synthesis | —Unverified | 0 |
| JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment | May 15, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation | May 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN | May 12, 2020 | Few-Shot Learningtext-to-speech | —Unverified | 0 |
| DiscreTalk: Text-to-Speech as a Machine Translation Problem | May 12, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |