| STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions | May 30, 2023 | AllAutomatic Speech Recognition | —Unverified | 0 |
| Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages | May 30, 2023 | Predictiontext-to-speech | —Unverified | 0 |
| LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus | May 30, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Towards Selection of Text-to-speech Data to Augment ASR Training | May 30, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Make-A-Voice: Unified Voice Synthesis With Discrete Representation | May 30, 2023 | Singing Voice Synthesistext-to-speech | —Unverified | 0 |
| Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis | May 29, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction | May 26, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion | May 25, 2023 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 0 |
| VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation | May 25, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models | May 23, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer | May 22, 2023 | DecoderDenoising | —Unverified | 0 |
| Text Generation with Speech Synthesis for ASR Data Augmentation | May 22, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages | May 21, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios | May 20, 2023 | Rhythmtext-to-speech | —Unverified | 0 |
| MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting | May 19, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Data Redaction from Conditional Generative Models | May 18, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs | May 18, 2023 | Decodertext-to-speech | —Unverified | 0 |
| A unified front-end framework for English text-to-speech synthesis | May 18, 2023 | Speech SynthesisText Normalization | —Unverified | 0 |
| Controllable Speaking Styles Using a Large Language Model | May 17, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Accented Text-to-Speech Synthesis with Limited Data | May 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis | May 3, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Review of Deep Learning Techniques for Speech Processing | Apr 30, 2023 | Automatic Speech RecognitionDeep Learning | —Unverified | 0 |
| Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model | Apr 24, 2023 | RhythmSelf-Supervised Learning | —Unverified | 0 |
| DiffVoice: Text-to-Speech with Latent Diffusion | Apr 23, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers | Apr 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Enhancing Speech-to-Speech Translation with Multiple TTS Targets | Apr 10, 2023 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| An investigation of phrase break prediction in an End-to-End TTS system | Apr 9, 2023 | Predictiontext-to-speech | CodeCode Available | 0 |
| ArmanTTS single-speaker Persian dataset | Apr 7, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Ensemble prosody prediction for expressive speech synthesis | Apr 3, 2023 | DiversityEnsemble Learning | —Unverified | 0 |
| AraSpot: Arabic Spoken Command Spotting | Mar 29, 2023 | Data AugmentationKeyword Spotting | CodeCode Available | 0 |
| Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis | Mar 27, 2023 | AllAutomatic Speech Recognition | —Unverified | 0 |
| Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis | Mar 24, 2023 | Generative Adversarial NetworkSpeech Synthesis | —Unverified | 0 |
| A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI | Mar 23, 2023 | Speech EnhancementSpeech Synthesis | —Unverified | 0 |
| Code-Switching Text Generation and Injection in Mandarin-English ASR | Mar 20, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Cross-speaker Emotion Transfer by Manipulating Speech Style Latents | Mar 15, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Controllable Prosody Generation With Partial Inputs | Mar 14, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis | Mar 14, 2023 | Emotional Speech SynthesisSentence | —Unverified | 0 |
| An End-to-End Neural Network for Image-to-Audio Transformation | Mar 10, 2023 | Image to texttext-to-speech | —Unverified | 0 |
| Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports | Mar 9, 2023 | text-to-speechText to Speech | CodeCode Available | 0 |
| Do Prosody Transfer Models Transfer Prosody? | Mar 7, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model | Mar 6, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities | Mar 2, 2023 | Learning-To-Ranktext-to-speech | —Unverified | 0 |
| LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion | Mar 2, 2023 | Grapheme-to-Phoneme Conversionspeech-recognition | —Unverified | 0 |
| Leveraging Large Text Corpora for End-to-End Speech Summarization | Mar 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations | Mar 1, 2023 | Self-Supervised LearningSpeech Synthesis | —Unverified | 0 |
| DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction | Mar 1, 2023 | Dynamic Time WarpingMetric Learning | —Unverified | 0 |
| UniFLG: Unified Facial Landmark Generator from Text or Speech | Feb 28, 2023 | DecoderFace Generation | —Unverified | 0 |
| Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners | Feb 28, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |