| Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement | Nov 8, 2020 | DisentanglementSpeech Synthesis | —Unverified | 0 |
| Naturalization of Text by the Insertion of Pauses and Filler Words | Nov 7, 2020 | Sentencetext-to-speech | CodeCode Available | 0 |
| Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis | Nov 6, 2020 | DecoderSentence | —Unverified | 0 |
| Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis | Nov 6, 2020 | DecoderSpeech Synthesis | CodeCode Available | 1 |
| Semi-supervised URL Segmentation with Recurrent Neural NetworksPre-trained on Knowledge Graph Entities | Nov 5, 2020 | Chinese Word SegmentationSpeech Synthesis | CodeCode Available | 1 |
| Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework | Nov 4, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech | Nov 4, 2020 | Graph AttentionRepresentation Learning | —Unverified | 0 |
| Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time | Nov 4, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization | Nov 3, 2020 | Spectral Reconstructiontext-to-speech | CodeCode Available | 1 |
| Training Wake Word Detection with Synthesized Speech Data on Confusion Words | Nov 3, 2020 | Data AugmentationKeyword Spotting | —Unverified | 0 |
| Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech | Nov 2, 2020 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 |
| Learning from Explanations and Demonstrations: A Pilot Study | Nov 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine Translation | Nov 1, 2020 | Dynamic Time WarpingMachine Translation | CodeCode Available | 1 |
| Effective Deep Learning Models for Automatic Diacritization of Arabic Text | Nov 1, 2020 | Arabic Text DiacritizationDecoder | CodeCode Available | 1 |
| DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech | Oct 29, 2020 | Decodertext-to-speech | —Unverified | 0 |
| Effective Decoder Masking for Transformer Based End-to-End Speech Recognition | Oct 27, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| One-class learning towards generalized voice spoofing detection | Oct 27, 2020 | Speaker Verificationtext-to-speech | CodeCode Available | 1 |
| Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators | Oct 27, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition | Oct 26, 2020 | Emotion RecognitionSpeech Emotion Recognition | —Unverified | 0 |
| GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis | Oct 23, 2020 | Graph AttentionGraph Neural Network | —Unverified | 0 |
| The NTU-AISG Text-to-speech System for Blizzard Challenge 2020 | Oct 22, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| NU-GAN: High resolution neural upsampling with GAN | Oct 22, 2020 | Audio GenerationSpeech Synthesis | —Unverified | 0 |
| A Mask-based Model for Mandarin Chinese Polyphone Disambiguation | Oct 21, 2020 | Polyphone disambiguationtext-to-speech | —Unverified | 0 |
| Learning Speaker Embedding from Text-to-Speech | Oct 21, 2020 | ClassificationDecoder | CodeCode Available | 0 |
| An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems | Oct 21, 2020 | Grapheme-to-Phoneme ConversionRelation | —Unverified | 0 |
| Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction | Oct 20, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End-to-End Text-to-Speech using Latent Duration based on VQ-VAE | Oct 19, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion | Oct 16, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview | Oct 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Improving Low Resource Code-switched ASR using Augmented Code-switched TTS | Oct 12, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems | Oct 8, 2020 | Data Augmentationintent-classification | —Unverified | 0 |
| Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling | Oct 8, 2020 | Speech Recognitiontext-to-speech | CodeCode Available | 1 |
| Latent linguistic embedding for cross-lingual text-to-speech and voice conversion | Oct 8, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Neural Speech Synthesis for Estonian | Oct 6, 2020 | SentenceSpeech Synthesis | —Unverified | 0 |
| The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS | Oct 6, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| JSSS: free Japanese speech corpus for summarization and simplification | Oct 5, 2020 | FormSpeech Synthesis | CodeCode Available | 0 |
| Compress Polyphone Pronunciation Prediction Model with Shared Labels | Oct 1, 2020 | PredictionQuantization | —Unverified | 0 |
| Automatic Arabic Dialect Identification Systems for Written Texts: A Survey | Sep 26, 2020 | Dialect IdentificationMachine Translation | —Unverified | 0 |
| Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries | Sep 21, 2020 | Sentencetext-to-speech | CodeCode Available | 1 |
| Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis | Sep 17, 2020 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Controllable neural text-to-speech synthesis using intuitive prosodic features | Sep 14, 2020 | SentenceSpeech Synthesis | —Unverified | 0 |
| What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS | Sep 4, 2020 | DecoderSentence | —Unverified | 0 |
| Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer | Sep 3, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion | Aug 13, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Textual Echo Cancellation | Aug 13, 2020 | Acoustic echo cancellationspeech-recognition | —Unverified | 0 |
| Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding | Aug 12, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages | Aug 11, 2020 | Quantizationtext-to-speech | —Unverified | 0 |
| Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems | Aug 11, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions | Aug 9, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition | Aug 9, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |