| Building Synthetic Speaker Profiles in Text-to-Speech Systems | Feb 7, 2022 | Diversitytext-to-speech | —Unverified | 0 |
| Multi-Stage Deep Transfer Learning for EmIoT-enabled Human-Computer Interaction | Feb 3, 2022 | Human-Object Interaction Detectiontext-to-speech | —Unverified | 0 |
| Transformer-based Models of Text Normalization for Speech Applications | Feb 1, 2022 | SentenceSpeech Synthesis | —Unverified | 0 |
| DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs | Jan 28, 2022 | DenoisingSpeech Synthesis | —Unverified | 0 |
| Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition | Jan 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| The MSXF TTS System for ICASSP 2022 ADD Challenge | Jan 27, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention | Jan 25, 2022 | FormSpeech Synthesis | —Unverified | 0 |
| Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end | Jan 24, 2022 | Morphological AnalysisPolyphone disambiguation | —Unverified | 0 |
| Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training | Jan 20, 2022 | Multi-Task LearningSpeech Synthesis | —Unverified | 0 |
| Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems | Jan 16, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics | Jan 15, 2022 | Articlestext-to-speech | —Unverified | 0 |
| A Practical Guide to Logical Access Voice Presentation Attack Detection | Jan 10, 2022 | Artifact DetectionSpeaker Verification | —Unverified | 0 |
| A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architecture | Jan 6, 2022 | Speech-to-Texttext-to-speech | CodeCode Available | 0 |
| SoK: A Study of the Security on Voice Processing Systems | Dec 24, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios | Dec 23, 2021 | DiversitySpeech Synthesis | —Unverified | 0 |
| Multi-speaker Emotional Text-to-speech Synthesizer | Dec 7, 2021 | Alltext-to-speech | —Unverified | 0 |
| Speech-T: Transducer for Text to Speech and Beyond | Dec 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Generating Rich Product Descriptions for Conversational E-commerce Systems | Nov 30, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet | Nov 29, 2021 | Spoken Language Understandingtext-to-speech | CodeCode Available | 1 |
| Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance | Nov 23, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control | Nov 19, 2021 | ClusteringData Augmentation | —Unverified | 0 |
| Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages | Nov 19, 2021 | Data Augmentationspeech-recognition | —Unverified | 0 |
| Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis | Nov 19, 2021 | ClusteringDecoder | —Unverified | 0 |
| More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech | Nov 19, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency | Nov 17, 2021 | CPUDecoder | —Unverified | 0 |
| Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech | Nov 16, 2021 | Diversitytext-to-speech | —Unverified | 0 |
| Speech Synthesis for Low Resource Languages using Transliteration Enabled Transfer Learning | Nov 16, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning | Nov 14, 2021 | DisentanglementMeta-Learning | —Unverified | 0 |
| Speaker Generation | Nov 7, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Emotional Prosody Control for Speech Generation | Nov 7, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech | Nov 7, 2021 | Meta-LearningSpeech Synthesis | CodeCode Available | 1 |
| fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit | Nov 1, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Controlling Prosody in End-to-End TTS: A Case Study on Contrastive Focus Generation | Nov 1, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| ViDA-MAN: Visual Dialog with Digital Humans | Oct 26, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021 | Oct 25, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech | Oct 24, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection | Oct 18, 2021 | Speech SynthesisSynthetic Speech Detection | CodeCode Available | 1 |
| ESPnet2-TTS: Extending the Edge of TTS Research | Oct 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation | Oct 15, 2021 | Data AugmentationSimultaneous Speech-to-Speech Translation | —Unverified | 0 |
| Neural Dubber: Dubbing for Videos According to Scripts | Oct 15, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech | Oct 14, 2021 | Disentanglementtext-to-speech | —Unverified | 0 |
| FedSpeech: Federated Text-to-Speech with Continual Learning | Oct 14, 2021 | Continual LearningFederated Learning | —Unverified | 0 |
| SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation | Oct 14, 2021 | Generative Adversarial NetworkGPU | —Unverified | 0 |
| Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Revisiting IPA-based Cross-lingual Text-to-speech | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Systematic Inequalities in Language Technology Performance across the World's Languages | Oct 13, 2021 | Dependency ParsingMachine Translation | CodeCode Available | 0 |
| A Melody-Unsupervision Model for Singing Voice Synthesis | Oct 13, 2021 | modelSinging Voice Synthesis | —Unverified | 0 |
| Fine-grained style control in Transformer-based Text-to-speech Synthesis | Oct 12, 2021 | Inductive BiasSpeech Synthesis | CodeCode Available | 1 |
| Adapting TTS models For New Speakers using Transfer Learning | Oct 12, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis | Oct 9, 2021 | Lifelong learningSpeech Synthesis | CodeCode Available | 0 |