| RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis | Dec 15, 2022 | RelationSpeech Synthesis | CodeCode Available | 1 |
| Probing Deep Speaker Embeddings for Speaker-related Tasks | Dec 14, 2022 | Speaker RecognitionSpeaker Verification | —Unverified | 0 |
| BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm | Dec 11, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset | Dec 11, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| SpeechLMScore: Evaluating speech generation using speech language model | Dec 8, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Learning to Dub Movies via Hierarchical Prosody Models | Dec 8, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue | Dec 7, 2022 | Spoken Dialogue Systemstext-to-speech | —Unverified | 0 |
| Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning | Dec 7, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech | Nov 30, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Controllable speech synthesis by learning discrete phoneme-level prosodic representations | Nov 29, 2022 | ClusteringSpeech Synthesis | —Unverified | 0 |
| Evaluating and reducing the distance between synthetic and real speech distributions | Nov 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Contextual Expressive Text-to-Speech | Nov 26, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Efficient Incremental Text-to-Speech on GPUs | Nov 25, 2022 | GPUSpeech Synthesis | —Unverified | 0 |
| IMaSC -- ICFOSS Malayalam Speech Corpus | Nov 23, 2022 | Sentencetext-to-speech | —Unverified | 0 |
| PromptTTS: Controllable Text-to-Speech with Text Descriptions | Nov 22, 2022 | DecoderSpeech Synthesis | CodeCode Available | 0 |
| Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models | Nov 17, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Towards Building Text-To-Speech Systems for the Next Billion Users | Nov 17, 2022 | DiversitySpeech Synthesis | CodeCode Available | 2 |
| EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance | Nov 17, 2022 | Denoisingtext-to-speech | —Unverified | 0 |
| Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation | Nov 17, 2022 | Data AugmentationMachine Translation | —Unverified | 0 |
| SNIPER Training: Single-Shot Sparse Training for Text-to-Speech | Nov 14, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| OverFlow: Putting flows on top of neural transducers for better TTS | Nov 13, 2022 | Normalising FlowsSpeech Synthesis | CodeCode Available | 1 |
| Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations | Nov 11, 2022 | Emotional Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder | Nov 7, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech | Nov 7, 2022 | Representation LearningSpeech Representation Learning | CodeCode Available | 6 |
| An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space | Nov 6, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Parallel Attention Forcing for Machine Translation | Nov 6, 2022 | Machine TranslationNMT | —Unverified | 0 |
| Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech | Nov 4, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages | Nov 1, 2022 | ChunkingRhythm | —Unverified | 0 |
| Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features | Nov 1, 2022 | POSPrediction | —Unverified | 0 |
| Generating Multilingual Gender-Ambiguous Text-to-Speech Voices | Nov 1, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers | Nov 1, 2022 | parameter-efficient fine-tuningSpeech Synthesis | —Unverified | 0 |
| Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection | Oct 31, 2022 | Audio CompressionFace Swapping | —Unverified | 0 |
| Structured State Space Decoder for Speech Recognition and Synthesis | Oct 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation | Oct 31, 2022 | DecoderDisentanglement | —Unverified | 0 |
| Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform | Oct 28, 2022 | CPUKnowledge Distillation | CodeCode Available | 2 |
| Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis | Oct 28, 2022 | DecoderDiversity | —Unverified | 0 |
| Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation | Oct 28, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders | Oct 28, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis | Oct 27, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Explicit Intensity Control for Accented Text-to-speech | Oct 27, 2022 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech | Oct 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Improving Speech-to-Speech Translation Through Unlabeled Text | Oct 26, 2022 | Machine Translationspeech-recognition | —Unverified | 0 |
| Semi-Supervised Learning Based on Reference Model for Low-resource TTS | Oct 25, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data | Oct 25, 2022 | DecoderDisentanglement | —Unverified | 0 |
| Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS | Oct 24, 2022 | Data AugmentationGPU | —Unverified | 0 |
| HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation | Oct 23, 2022 | Generative Adversarial NetworkSinging Voice Synthesis | CodeCode Available | 1 |
| Low-Resource Multilingual and Zero-Shot Multispeaker TTS | Oct 21, 2022 | Meta-Learningtext-to-speech | —Unverified | 0 |
| Adaptive re-calibration of channel-wise features for Adversarial Audio Classification | Oct 21, 2022 | Audio ClassificationFace Swapping | —Unverified | 0 |
| Towards Relation Extraction From Speech | Oct 17, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Generating Synthetic Speech from SpokenVocab for Speech Translation | Oct 15, 2022 | Data AugmentationMachine Translation | CodeCode Available | 0 |