SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 125 of 1419 papers

TitleStatusHype
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemCode11
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingCode11
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
Moshi: a speech-text foundation model for real-time dialogueCode9
Overview of the Amphion Toolkit (v0.2)Code9
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec TransformerCode9
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the WildCode9
Natural language guidance of high-fidelity text-to-speech with synthetic annotationsCode9
Speechless: Speech Instruction Training Without Speech for Low Resource LanguagesCode7
Neural Codec Language Models are Zero-Shot Text to Speech SynthesizersCode7
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsCode7
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken ChatbotCode7
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitCode6
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-SpeechCode6
Better speech synthesis through scalingCode6
SpeechGPT-Gen: Scaling Chain-of-Information Speech GenerationCode5
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsCode5
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-SpeechCode5
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme PredictionsCode5
Enabling Auditory Large Language Models for Automatic Speech Quality EvaluationCode5
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language ModelingCode5
Ming-Omni: A Unified Multimodal Model for Perception and GenerationCode4
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingCode4
Show:102550
← PrevPage 1 of 57Next →

No leaderboard results yet.