SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 150 of 1419 papers

TitleStatusHype
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingCode11
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemCode11
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec TransformerCode9
Natural language guidance of high-fidelity text-to-speech with synthetic annotationsCode9
Moshi: a speech-text foundation model for real-time dialogueCode9
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the WildCode9
Overview of the Amphion Toolkit (v0.2)Code9
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
Speechless: Speech Instruction Training Without Speech for Low Resource LanguagesCode7
Neural Codec Language Models are Zero-Shot Text to Speech SynthesizersCode7
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsCode7
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken ChatbotCode7
Better speech synthesis through scalingCode6
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-SpeechCode6
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitCode6
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-SpeechCode5
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsCode5
SpeechGPT-Gen: Scaling Chain-of-Information Speech GenerationCode5
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme PredictionsCode5
Enabling Auditory Large Language Models for Automatic Speech Quality EvaluationCode5
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language ModelingCode5
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow MatchingCode4
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow MatchingCode4
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language ModelCode4
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained HubertCode4
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingCode4
Ming-Omni: A Unified Multimodal Model for Perception and GenerationCode4
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice CloningCode3
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style ControlCode3
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language ModelCode3
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-PlayCode3
WavChat: A Survey of Spoken Dialogue ModelsCode3
SoundStream: An End-to-End Neural Audio CodecCode3
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech SynthesisCode3
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-SpeechCode3
Towards Controllable Speech Synthesis in the Era of Large Language Models: A SurveyCode3
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-JudgeCode3
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationCode3
MoonCast: High-Quality Zero-Shot Podcast GenerationCode3
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion ModelsCode3
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform GenerationCode3
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-SpeechCode2
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
Llama-VITS: Enhancing TTS Synthesis with Semantic AwarenessCode2
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style CaptioningCode2
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier TransformCode2
LPCNet: Improving Neural Speech Synthesis Through Linear PredictionCode2
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier TransformCode2
Show:102550
← PrevPage 1 of 29Next →

No leaderboard results yet.