| Environment Aware Text-to-Speech Synthesis | Oct 8, 2021 | AttributeDisentanglement | —Unverified | 0 |
| A study on the efficacy of model pre-training in developing neural text-to-speech system | Oct 8, 2021 | Computational Efficiencytext-to-speech | —Unverified | 0 |
| Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech | Oct 8, 2021 | Emotion InterpretationExpressive Speech Synthesis | CodeCode Available | 1 |
| VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over | Oct 7, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Applying Phonological Features in Multilingual Text-To-Speech | Oct 7, 2021 | Language Acquisitiontext-to-speech | CodeCode Available | 0 |
| Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings | Oct 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Emphasis control for parallel neural TTS | Oct 6, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| GANtron: Emotional Speech Synthesis with Generative Adversarial Networks | Oct 6, 2021 | Emotional Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS | Oct 6, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 |
| Prosody-TTS: An end-to-end speech synthesis system with prosody control | Oct 6, 2021 | RhythmSpeech Synthesis | —Unverified | 0 |
| Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models | Oct 6, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis | Oct 4, 2021 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 |
| Neural Speech Synthesis in German | Oct 3, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system | Oct 1, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| PortaSpeech: Portable and High-Quality Generative Text-to-Speech | Sep 30, 2021 | text-to-speechText to Speech | CodeCode Available | 2 |
| Conditioning Sequence-to-sequence Networks with Learned Activations | Sep 29, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Guided-TTS:Text-to-Speech with Untranscribed Speech | Sep 29, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis | Sep 27, 2021 | Density EstimationSpeech Synthesis | —Unverified | 0 |
| A Proposal of Automatic Error Correction in Text | Sep 24, 2021 | Information RetrievalLanguage Modelling | —Unverified | 0 |
| Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network | Sep 22, 2021 | Knowledge DistillationLanguage Modeling | —Unverified | 0 |
| On-device neural speech synthesis | Sep 17, 2021 | GPUSpeech Synthesis | —Unverified | 0 |
| fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit | Sep 14, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration | Sep 12, 2021 | Decodertext-to-speech | CodeCode Available | 1 |
| Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis | Sep 8, 2021 | Expressive Speech SynthesisSentence | —Unverified | 0 |
| Integrated Speech and Gesture Synthesis | Aug 25, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| A Unified Transformer-based Framework for Duplex Text Normalization | Aug 23, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues | Aug 18, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints | Aug 16, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Enhancing audio quality for expressive Neural Text-to-Speech | Aug 13, 2021 | Acoustic ModellingSpeech Synthesis | —Unverified | 0 |
| RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform | Aug 12, 2021 | Speaker VerificationSynthetic Speech Detection | —Unverified | 0 |
| AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person | Aug 9, 2021 | Talking Head Generationtext-to-speech | —Unverified | 0 |
| A Speech-enabled Fixed-phrase Translator for Healthcare Accessibility | Aug 1, 2021 | Machine Translationspeech-recognition | —Unverified | 0 |
| BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text | Aug 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Survey on Audio Synthesis and Audio-Visual Multimodal Processing | Aug 1, 2021 | Audio SynthesisMusic Generation | —Unverified | 0 |
| Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis | Jul 27, 2021 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021 | Jul 26, 2021 | Audio CompressionFace Swapping | CodeCode Available | 1 |
| Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging | Jul 26, 2021 | text-to-speechText to Speech | CodeCode Available | 0 |
| StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion | Jul 21, 2021 | Generative Adversarial Networktext-to-speech | CodeCode Available | 1 |
| Digital Einstein Experience: Fast Text-to-Speech for Conversational AI | Jul 21, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| On Prosody Modeling for ASR+TTS based Voice Conversion | Jul 20, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging | Jul 12, 2021 | PredictionSpeech Synthesis | CodeCode Available | 0 |
| Federated Learning with Dynamic Transformer for Text to Speech | Jul 9, 2021 | Federated Learningtext-to-speech | —Unverified | 0 |
| SoundStream: An End-to-End Neural Audio Codec | Jul 7, 2021 | CPUDecoder | CodeCode Available | 3 |
| AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style | Jul 6, 2021 | DecoderMixture-of-Experts | —Unverified | 0 |
| Location, Location: Enhancing the Evaluation of Text-to-Speech Synthesis Using the Rapid Prosody Transcription Paradigm | Jul 6, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input | Jul 5, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion | Jul 4, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech | Jun 29, 2021 | DecoderSentence | —Unverified | 0 |
| Multi-Scale Spectrogram Modelling for Neural Text-to-Speech | Jun 29, 2021 | Sentencetext-to-speech | —Unverified | 0 |