| XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech | May 31, 2023 | text-to-speechText to Speech | CodeCode Available | 5 |
| Towards Selection of Text-to-speech Data to Augment ASR Training | May 30, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages | May 30, 2023 | Predictiontext-to-speech | —Unverified | 0 |
| LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus | May 30, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Make-A-Voice: Unified Voice Synthesis With Discrete Representation | May 30, 2023 | Singing Voice Synthesistext-to-speech | —Unverified | 0 |
| STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions | May 30, 2023 | AllAutomatic Speech Recognition | —Unverified | 0 |
| Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis | May 29, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | May 29, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS | May 28, 2023 | Diversitytext-to-speech | CodeCode Available | 1 |
| An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization | May 26, 2023 | Audio GenerationInference Attack | CodeCode Available | 1 |
| DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction | May 26, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion | May 25, 2023 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 0 |
| VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation | May 25, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration | May 25, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| EfficientSpeech: An On-Device Text to Speech Model | May 23, 2023 | CPUmodel | CodeCode Available | 1 |
| ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models | May 23, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Text Generation with Speech Synthesis for ASR Data Augmentation | May 22, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels | May 22, 2023 | Expressive Speech SynthesisSpeech Synthesis | CodeCode Available | 1 |
| ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer | May 22, 2023 | DecoderDenoising | —Unverified | 0 |
| VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages | May 21, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios | May 20, 2023 | Rhythmtext-to-speech | —Unverified | 0 |
| MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting | May 19, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Data Redaction from Conditional Generative Models | May 18, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Parameter-Efficient Learning for Text-to-Speech Accent Adaptation | May 18, 2023 | Decodertext-to-speech | CodeCode Available | 1 |
| Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation | May 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data | May 18, 2023 | Speech EnhancementSpeech Synthesis | CodeCode Available | 1 |
| A unified front-end framework for English text-to-speech synthesis | May 18, 2023 | Speech SynthesisText Normalization | —Unverified | 0 |
| FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs | May 18, 2023 | Decodertext-to-speech | —Unverified | 0 |
| Controllable Speaking Styles Using a Large Language Model | May 17, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Better speech synthesis through scaling | May 12, 2023 | Image GenerationSpeech Synthesis | CodeCode Available | 6 |
| CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model | May 11, 2023 | DenoisingGPU | CodeCode Available | 2 |
| Accented Text-to-Speech Synthesis with Limited Data | May 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Bts-e: Audio deepfake detection using breathing-talking-silence encoder | May 5, 2023 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 1 |
| M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis | May 3, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Review of Deep Learning Techniques for Speech Processing | Apr 30, 2023 | Automatic Speech RecognitionDeep Learning | —Unverified | 0 |
| Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis | Apr 26, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 2 |
| Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model | Apr 24, 2023 | RhythmSelf-Supervised Learning | —Unverified | 0 |
| DiffVoice: Text-to-Speech with Latent Diffusion | Apr 23, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert | Apr 18, 2023 | Audio GenerationExpressive Speech Synthesis | CodeCode Available | 4 |
| NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers | Apr 18, 2023 | In-Context LearningSpeech Synthesis | CodeCode Available | 2 |
| A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers | Apr 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Enhancing Speech-to-Speech Translation with Multiple TTS Targets | Apr 10, 2023 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| An investigation of phrase break prediction in an End-to-End TTS system | Apr 9, 2023 | Predictiontext-to-speech | CodeCode Available | 0 |
| ArmanTTS single-speaker Persian dataset | Apr 7, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Ensemble prosody prediction for expressive speech synthesis | Apr 3, 2023 | DiversityEnsemble Learning | —Unverified | 0 |
| AraSpot: Arabic Spoken Command Spotting | Mar 29, 2023 | Data AugmentationKeyword Spotting | CodeCode Available | 0 |
| Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages | Mar 28, 2023 | Data Augmentationtext-to-speech | CodeCode Available | 1 |
| Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis | Mar 27, 2023 | AllAutomatic Speech Recognition | —Unverified | 0 |
| Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis | Mar 24, 2023 | Generative Adversarial NetworkSpeech Synthesis | —Unverified | 0 |