SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 2650 of 1419 papers

TitleStatusHype
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingCode4
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language ModelCode4
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained HubertCode4
Ming-Omni: A Unified Multimodal Model for Perception and GenerationCode4
WavChat: A Survey of Spoken Dialogue ModelsCode3
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-PlayCode3
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech SynthesisCode3
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice CloningCode3
Towards Controllable Speech Synthesis in the Era of Large Language Models: A SurveyCode3
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style ControlCode3
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform GenerationCode3
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language ModelCode3
SoundStream: An End-to-End Neural Audio CodecCode3
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-JudgeCode3
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationCode3
MoonCast: High-Quality Zero-Shot Podcast GenerationCode3
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion ModelsCode3
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-SpeechCode3
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-SpeechCode2
Llama-VITS: Enhancing TTS Synthesis with Semantic AwarenessCode2
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style CaptioningCode2
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier TransformCode2
LPCNet: Improving Neural Speech Synthesis Through Linear PredictionCode2
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous SpeechCode2
Show:102550
← PrevPage 2 of 57Next →

No leaderboard results yet.