SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 2650 of 1419 papers

TitleStatusHype
Ming-Omni: A Unified Multimodal Model for Perception and GenerationCode4
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language ModelCode4
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingCode4
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained HubertCode4
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-JudgeCode3
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-PlayCode3
MoonCast: High-Quality Zero-Shot Podcast GenerationCode3
Towards Controllable Speech Synthesis in the Era of Large Language Models: A SurveyCode3
WavChat: A Survey of Spoken Dialogue ModelsCode3
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language ModelCode3
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationCode3
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style ControlCode3
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion ModelsCode3
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech SynthesisCode3
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-SpeechCode3
SoundStream: An End-to-End Neural Audio CodecCode3
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform GenerationCode3
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice CloningCode3
Differentiable Reward Optimization for LLM based TTS systemCode2
PresentAgent: Multimodal Agent for Presentation Video GenerationCode2
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow MatchingCode2
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality AlignmentCode2
RWKVTTS: Yet another TTS based on RWKV-7Code2
TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud DetectionCode2
Scaling Rich Style-Prompted Text-to-Speech DatasetsCode2
Show:102550
← PrevPage 2 of 57Next →

No leaderboard results yet.