| Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation | Jul 30, 2023 | text-to-speechText to Speech | CodeCode Available | 1 |
| ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus | Jul 29, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer | Jul 20, 2023 | Expressive Speech SynthesisLanguage Modelling | CodeCode Available | 1 |
| Text + Sketch: Image Compression at Ultra Low Rates | Jul 4, 2023 | Image CompressionText to Speech | CodeCode Available | 1 |
| EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech | Jun 28, 2023 | Emotion RecognitionSpeech Synthesis | CodeCode Available | 1 |
| Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects | Jun 14, 2023 | Recommendation Systemstext-to-speech | CodeCode Available | 1 |
| ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | May 29, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS | May 28, 2023 | Diversitytext-to-speech | CodeCode Available | 1 |
| An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization | May 26, 2023 | Audio GenerationInference Attack | CodeCode Available | 1 |
| Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration | May 25, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| EfficientSpeech: An On-Device Text to Speech Model | May 23, 2023 | CPUmodel | CodeCode Available | 1 |
| EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels | May 22, 2023 | Expressive Speech SynthesisSpeech Synthesis | CodeCode Available | 1 |
| Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data | May 18, 2023 | Speech EnhancementSpeech Synthesis | CodeCode Available | 1 |
| Parameter-Efficient Learning for Text-to-Speech Accent Adaptation | May 18, 2023 | Decodertext-to-speech | CodeCode Available | 1 |
| Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation | May 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Bts-e: Audio deepfake detection using breathing-talking-silence encoder | May 5, 2023 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 1 |
| Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages | Mar 28, 2023 | Data Augmentationtext-to-speech | CodeCode Available | 1 |
| Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations | Mar 3, 2023 | Speech DenoisingSpeech Enhancement | CodeCode Available | 1 |
| Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding | Mar 2, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech | Feb 27, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining | Jan 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech | Dec 30, 2022 | Denoisingtext-to-speech | CodeCode Available | 1 |
| StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models | Dec 29, 2022 | Data Augmentationtext-to-speech | CodeCode Available | 1 |
| RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis | Dec 15, 2022 | RelationSpeech Synthesis | CodeCode Available | 1 |
| MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset | Dec 11, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm | Dec 11, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Learning to Dub Movies via Hierarchical Prosody Models | Dec 8, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| SpeechLMScore: Evaluating speech generation using speech language model | Dec 8, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| OverFlow: Putting flows on top of neural transducers for better TTS | Nov 13, 2022 | Normalising FlowsSpeech Synthesis | CodeCode Available | 1 |
| Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder | Nov 7, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis | Oct 27, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation | Oct 23, 2022 | Generative Adversarial NetworkSinging Voice Synthesis | CodeCode Available | 1 |
| Towards Relation Extraction From Speech | Oct 17, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Can we use Common Voice to train a Multi-Speaker TTS system? | Oct 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline | Sep 22, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Visualising Model Training via Vowel Space for Text-To-Speech Systems | Aug 21, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Dreamento: an open-source dream engineering toolbox for sleep EEG wearables | Jul 8, 2022 | EEGElectroencephalogram (EEG) | CodeCode Available | 1 |
| BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus | Jul 7, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| Building African Voices | Jul 1, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Automatic Prosody Annotation with Pre-Trained Text-Speech Model | Jun 16, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning | Jun 15, 2022 | AttributeEmotion Classification | CodeCode Available | 1 |
| Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech | Jun 5, 2022 | Polyphone disambiguationtext-to-speech | CodeCode Available | 1 |
| Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech | May 9, 2022 | Diversitytext-to-speech | CodeCode Available | 1 |
| A Character-level Span-based Model for Mandarin Prosodic Structure Prediction | Mar 31, 2022 | Sentencetext-to-speech | CodeCode Available | 1 |
| An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer | Mar 31, 2022 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech | Mar 31, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| End to End Lip Synchronization with a Temporal AutoEncoder | Mar 30, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition | Mar 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet | Nov 29, 2021 | Spoken Language Understandingtext-to-speech | CodeCode Available | 1 |
| More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech | Nov 19, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |