SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 150 of 1419 papers

TitleStatusHype
Hear Your Code Fail, Voice-Assisted Debugging for Python0
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech0
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge0
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments0
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow MatchingCode4
Exploiting Leaderboards for Large-Scale Distribution of Malicious Models0
MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling0
Differentiable Reward Optimization for LLM based TTS systemCode2
Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis0
PresentAgent: Multimodal Agent for Presentation Video GenerationCode2
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS0
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems0
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow MatchingCode2
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization0
Optimizing Multilingual Text-To-Speech with Accents & Emotions0
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement0
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech SystemsCode1
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction0
EmoNews: A Spoken Dialogue System for Expressive News ConversationsCode0
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow MatchingCode4
Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech0
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling0
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs0
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation0
Ming-Omni: A Unified Multimodal Model for Perception and GenerationCode4
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching0
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech InstructionsCode1
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data0
Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation0
Seeing Voices: Generating A-Roll Video from Audio with Mirage0
Voice Impression Control in Zero-Shot TTS0
Intelligibility of Text-to-Speech Systems for Mathematical Expressions0
Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning0
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset0
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?0
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions0
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing0
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation0
Towards a Japanese Full-duplex Spoken Dialogue System0
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech0
Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions0
Zero-Shot Text-to-Speech for Vietnamese0
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction0
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing0
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models0
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems0
Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement0
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation0
Can Emotion Fool Anti-spoofing?0
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting0
Show:102550
← PrevPage 1 of 29Next →

No leaderboard results yet.