| A Unified Transformer-based Framework for Duplex Text Normalization | Aug 23, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues | Aug 18, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints | Aug 16, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Enhancing audio quality for expressive Neural Text-to-Speech | Aug 13, 2021 | Acoustic ModellingSpeech Synthesis | —Unverified | 0 |
| RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform | Aug 12, 2021 | Speaker VerificationSynthetic Speech Detection | —Unverified | 0 |
| AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person | Aug 9, 2021 | Talking Head Generationtext-to-speech | —Unverified | 0 |
| BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text | Aug 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Speech-enabled Fixed-phrase Translator for Healthcare Accessibility | Aug 1, 2021 | Machine Translationspeech-recognition | —Unverified | 0 |
| A Survey on Audio Synthesis and Audio-Visual Multimodal Processing | Aug 1, 2021 | Audio SynthesisMusic Generation | —Unverified | 0 |
| Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis | Jul 27, 2021 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging | Jul 26, 2021 | text-to-speechText to Speech | CodeCode Available | 0 |
| Digital Einstein Experience: Fast Text-to-Speech for Conversational AI | Jul 21, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| On Prosody Modeling for ASR+TTS based Voice Conversion | Jul 20, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging | Jul 12, 2021 | PredictionSpeech Synthesis | CodeCode Available | 0 |
| Federated Learning with Dynamic Transformer for Text to Speech | Jul 9, 2021 | Federated Learningtext-to-speech | —Unverified | 0 |
| Location, Location: Enhancing the Evaluation of Text-to-Speech Synthesis Using the Rapid Prosody Transcription Paradigm | Jul 6, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style | Jul 6, 2021 | DecoderMixture-of-Experts | —Unverified | 0 |
| Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input | Jul 5, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Multi-Scale Spectrogram Modelling for Neural Text-to-Speech | Jun 29, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech | Jun 29, 2021 | DecoderSentence | —Unverified | 0 |
| Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech | Jun 24, 2021 | Generative Adversarial Networktext-to-speech | —Unverified | 0 |
| Non-native English lexicon creation for bilingual speech synthesis | Jun 21, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters | Jun 19, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model | Jun 17, 2021 | Emotional Speech SynthesisEmotion Classification | —Unverified | 0 |
| Improving the expressiveness of neural vocoding with non-affine Normalizing Flows | Jun 16, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| ADEPT: A Dataset for Evaluating Prosody Transfer | Jun 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis | Jun 15, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A learned conditional prior for the VAE acoustic space of a TTS system | Jun 14, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| SynthASR: Unlocking Synthetic Data for Speech Recognition | Jun 14, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows | Jun 10, 2021 | DisentanglementSentence | —Unverified | 0 |
| Speech BERT Embedding For Improving Prosody in Neural TTS | Jun 8, 2021 | Decodertext-to-speech | —Unverified | 0 |
| Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios | Jun 7, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech | Jun 5, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis | Jun 3, 2021 | Data AugmentationSpeaker Verification | —Unverified | 0 |
| An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis | Jun 3, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Dual Script E2E framework for Multilingual and Code-Switching ASR | Jun 2, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Corpus of Neutral Voice Speech in Brazilian Portuguese | May 21, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Learning Robust Latent Representations for Controllable Speech Synthesis | May 10, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Talrómur: A large Icelandic TTS corpus | May 1, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| On Addressing Practical Challenges for RNN-Transducer | Apr 27, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis | Apr 26, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Non-autoregressive sequence-to-sequence voice conversion | Apr 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis | Apr 14, 2021 | Dependency ParsingRepresentation Learning | —Unverified | 0 |
| Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures | Apr 12, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation | Apr 8, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features | Apr 8, 2021 | DecoderSpeech Synthesis | —Unverified | 0 |
| Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects | Apr 8, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| AI4D -- African Language Program | Apr 6, 2021 | Machine Translationspeech-recognition | CodeCode Available | 0 |
| Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability | Apr 3, 2021 | Emotion Recognitionreinforcement-learning | —Unverified | 0 |