SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 301325 of 1419 papers

TitleStatusHype
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset0
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech0
Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions0
Towards a Japanese Full-duplex Spoken Dialogue System0
Zero-Shot Text-to-Speech for Vietnamese0
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing0
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction0
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models0
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems0
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation0
Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement0
Can Emotion Fool Anti-spoofing?0
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting0
Few-Shot Speech Deepfake Detection Adaptation with Gaussian ProcessesCode0
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech0
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling0
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech0
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling0
KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization0
Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis0
SpeakStream: Streaming Text-to-Speech with Interleaved Data0
CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning0
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt0
RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations0
What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection0
Show:102550
← PrevPage 13 of 57Next →

No leaderboard results yet.