| Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch | Apr 12, 2022 | Sentencetext-to-speech | —Unverified | 0 |
| The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance | Apr 11, 2022 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Fine-grained Noise Control for Multispeaker Speech Synthesis | Apr 11, 2022 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech | Apr 8, 2022 | Diversitytext-to-speech | —Unverified | 0 |
| Karaoker: Alignment-free singing voice synthesis with speech training data | Apr 8, 2022 | Singing Voice SynthesisSpeaker Identification | —Unverified | 0 |
| Arabic Text-To-Speech (TTS) Data Preparation | Apr 7, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis | Apr 7, 2022 | QuantizationSpeech Synthesis | —Unverified | 0 |
| SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis | Apr 6, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification | Apr 6, 2022 | AttributeSpeaker Verification | —Unverified | 0 |
| Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation | Apr 6, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Deliberation Model for On-Device Spoken Language Understanding | Apr 4, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck | Apr 4, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature | Apr 2, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Text-To-Speech Data Augmentation for Low Resource Speech Recognition | Apr 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios | Apr 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer | Mar 31, 2022 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset | Mar 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| WavThruVec: Latent speech representation as intermediate features for neural speech synthesis | Mar 31, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Character-level Span-based Model for Mandarin Prosodic Structure Prediction | Mar 31, 2022 | Sentencetext-to-speech | CodeCode Available | 1 |
| JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech | Mar 31, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition | Mar 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech | Mar 31, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| End to End Lip Synchronization with a Temporal AutoEncoder | Mar 30, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| Does Audio Deepfake Detection Generalize? | Mar 30, 2022 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 |
| Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition | Mar 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation | Mar 29, 2022 | CPUDecoder | CodeCode Available | 2 |
| Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis | Mar 29, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus | Mar 29, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent | Mar 28, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge | Mar 27, 2022 | Computational Efficiencytext-to-speech | —Unverified | 0 |
| A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis | Mar 22, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling | Mar 21, 2022 | DecoderSpeech Synthesis | —Unverified | 0 |
| Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise | Mar 20, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis | Mar 20, 2022 | Speaker VerificationSpeech Synthesis | CodeCode Available | 0 |
| Improve few-shot voice cloning using multi-modal learning | Mar 18, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Text-free non-parallel many-to-many voice conversion using normalising flows | Mar 15, 2022 | Normalising FlowsSpeech Synthesis | —Unverified | 0 |
| Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features | Mar 7, 2022 | Meta-Learningtext-to-speech | —Unverified | 0 |
| iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform | Mar 4, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 2 |
| Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows | Mar 3, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 2 |
| Revisiting Over-Smoothness in Text to Speech | Feb 26, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video | Feb 25, 2022 | Face SwappingHuman Detection | —Unverified | 0 |
| Improving Cross-lingual Speech Synthesis with Triplet Training Scheme | Feb 22, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation | Feb 21, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech | Feb 16, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module | Feb 16, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Unsupervised word-level prosody tagging for controllable speech synthesis | Feb 15, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| NewsPod: Automatic and Interactive News Podcasts | Feb 15, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Distribution augmentation for low-resource expressive text-to-speech | Feb 13, 2022 | Data AugmentationDiversity | —Unverified | 0 |
| Deep Performer: Score-to-Audio Music Performance Synthesis | Feb 12, 2022 | DecoderSpeech Synthesis | —Unverified | 0 |
| Cross-speaker style transfer for text-to-speech using data augmentation | Feb 10, 2022 | Data AugmentationStyle Transfer | —Unverified | 0 |