| Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation | Apr 6, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis | Apr 6, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck | Apr 4, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| Deliberation Model for On-Device Spoken Language Understanding | Apr 4, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature | Apr 2, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Text-To-Speech Data Augmentation for Low Resource Speech Recognition | Apr 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios | Apr 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| WavThruVec: Latent speech representation as intermediate features for neural speech synthesis | Mar 31, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset | Mar 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech | Mar 31, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition | Mar 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Does Audio Deepfake Detection Generalize? | Mar 30, 2022 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 |
| Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis | Mar 29, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus | Mar 29, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent | Mar 28, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge | Mar 27, 2022 | Computational Efficiencytext-to-speech | —Unverified | 0 |
| A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis | Mar 22, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling | Mar 21, 2022 | DecoderSpeech Synthesis | —Unverified | 0 |
| ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis | Mar 20, 2022 | Speaker VerificationSpeech Synthesis | CodeCode Available | 0 |
| Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise | Mar 20, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Improve few-shot voice cloning using multi-modal learning | Mar 18, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Text-free non-parallel many-to-many voice conversion using normalising flows | Mar 15, 2022 | Normalising FlowsSpeech Synthesis | —Unverified | 0 |
| Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features | Mar 7, 2022 | Meta-Learningtext-to-speech | —Unverified | 0 |
| Revisiting Over-Smoothness in Text to Speech | Feb 26, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video | Feb 25, 2022 | Face SwappingHuman Detection | —Unverified | 0 |
| Improving Cross-lingual Speech Synthesis with Triplet Training Scheme | Feb 22, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation | Feb 21, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech | Feb 16, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module | Feb 16, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Unsupervised word-level prosody tagging for controllable speech synthesis | Feb 15, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| NewsPod: Automatic and Interactive News Podcasts | Feb 15, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Distribution augmentation for low-resource expressive text-to-speech | Feb 13, 2022 | Data AugmentationDiversity | —Unverified | 0 |
| Deep Performer: Score-to-Audio Music Performance Synthesis | Feb 12, 2022 | DecoderSpeech Synthesis | —Unverified | 0 |
| Cross-speaker style transfer for text-to-speech using data augmentation | Feb 10, 2022 | Data AugmentationStyle Transfer | —Unverified | 0 |
| Building Synthetic Speaker Profiles in Text-to-Speech Systems | Feb 7, 2022 | Diversitytext-to-speech | —Unverified | 0 |
| Multi-Stage Deep Transfer Learning for EmIoT-enabled Human-Computer Interaction | Feb 3, 2022 | Human-Object Interaction Detectiontext-to-speech | —Unverified | 0 |
| Transformer-based Models of Text Normalization for Speech Applications | Feb 1, 2022 | SentenceSpeech Synthesis | —Unverified | 0 |
| DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs | Jan 28, 2022 | DenoisingSpeech Synthesis | —Unverified | 0 |
| Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition | Jan 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| The MSXF TTS System for ICASSP 2022 ADD Challenge | Jan 27, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention | Jan 25, 2022 | FormSpeech Synthesis | —Unverified | 0 |
| Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end | Jan 24, 2022 | Morphological AnalysisPolyphone disambiguation | —Unverified | 0 |
| Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training | Jan 20, 2022 | Multi-Task LearningSpeech Synthesis | —Unverified | 0 |
| Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems | Jan 16, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics | Jan 15, 2022 | Articlestext-to-speech | —Unverified | 0 |
| A Practical Guide to Logical Access Voice Presentation Attack Detection | Jan 10, 2022 | Artifact DetectionSpeaker Verification | —Unverified | 0 |
| A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architecture | Jan 6, 2022 | Speech-to-Texttext-to-speech | CodeCode Available | 0 |
| SoK: A Study of the Security on Voice Processing Systems | Dec 24, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios | Dec 23, 2021 | DiversitySpeech Synthesis | —Unverified | 0 |
| Multi-speaker Emotional Text-to-speech Synthesizer | Dec 7, 2021 | Alltext-to-speech | —Unverified | 0 |