Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 1419 papers

Title	Date	Tasks	Status	Hype
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens	Jul 7, 2024	Language ModellingLarge Language Model	CodeCode Available	11
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching	Oct 9, 2024	Denoisingtext-to-speech	CodeCode Available	11
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System	Feb 8, 2025	DecoderLanguage Modeling	CodeCode Available	11
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens	Mar 3, 2025	Attributetext-to-speech	CodeCode Available	11
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer	Sep 1, 2024	Self-Supervised Learningtext-to-speech	CodeCode Available	9
Natural language guidance of high-fidelity text-to-speech with synthetic annotations	Feb 2, 2024	In-Context LearningLanguage Modeling	CodeCode Available	9
Moshi: a speech-text foundation model for real-time dialogue	Sep 17, 2024	Action DetectionActivity Detection	CodeCode Available	9
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild	Mar 25, 2024	DecoderLanguage Modeling	CodeCode Available	9
Overview of the Amphion Toolkit (v0.2)	Jan 26, 2025	text-to-speechText to Speech	CodeCode Available	9
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training	Feb 5, 2025	Self-Supervised LearningSpeech Enhancement	CodeCode Available	9
Speechless: Speech Instruction Training Without Speech for Low Resource Languages	May 23, 2025	speech-recognitionSpeech Recognition	CodeCode Available	7
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers	Jan 5, 2023	In-Context LearningLanguage Modeling	CodeCode Available	7
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models	Jun 4, 2024	In-Context LearningLanguage Modelling	CodeCode Available	7
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot	Dec 3, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	7
Better speech synthesis through scaling	May 12, 2023	Image GenerationSpeech Synthesis	CodeCode Available	6
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech	Nov 7, 2022	Representation LearningSpeech Representation Learning	CodeCode Available	6
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit	May 20, 2022	AllAutomatic Speech Recognition (ASR)	CodeCode Available	6
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech	May 31, 2023	text-to-speechText to Speech	CodeCode Available	5
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models	Jun 13, 2023	Speech Synthesistext-to-speech	CodeCode Available	5
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation	Jan 24, 2024	text-to-speechText to Speech	CodeCode Available	5
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions	Jan 20, 2023	text-to-speechText to Speech	CodeCode Available	5
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation	Sep 25, 2024	text-to-speechText to Speech	CodeCode Available	5
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling	Mar 7, 2023	In-Context LearningLanguage Modeling	CodeCode Available	5
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching	Jun 16, 2025	DecoderSpeech Synthesis	CodeCode Available	4
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching	Jul 12, 2025	Dialogue Generationtext-to-speech	CodeCode Available	4
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model	May 6, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	4
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert	Apr 18, 2023	Audio GenerationExpressive Speech Synthesis	CodeCode Available	4
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining	Aug 10, 2023	Audio GenerationIn-Context Learning	CodeCode Available	4
Ming-Omni: A Unified Multimodal Model for Perception and Generation	Jun 11, 2025	Image Generationtext-to-speech	CodeCode Available	4
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning	Jul 9, 2019	Speech Synthesistext-to-speech	CodeCode Available	3
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control	Jun 3, 2024	Speech Synthesistext-to-speech	CodeCode Available	3
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model	Aug 30, 2024	Audio CompressionAudio Generation	CodeCode Available	3
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play	May 5, 2025	AI AgentAutomatic Speech Recognition	CodeCode Available	3
WavChat: A Survey of Spoken Dialogue Models	Nov 15, 2024	speech-recognitionSpeech Recognition	CodeCode Available	3
SoundStream: An End-to-End Neural Audio Codec	Jul 7, 2021	CPUDecoder	CodeCode Available	3
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis	Nov 21, 2023	Speech SynthesisSuper-Resolution	CodeCode Available	3
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech	Jul 13, 2022	DenoisingGPU	CodeCode Available	3
Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey	Dec 9, 2024	Speech SynthesisSurvey	CodeCode Available	3
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge	May 29, 2025	text-to-speechText to Speech	CodeCode Available	3
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation	Aug 14, 2024	Speech Synthesistext-to-speech	CodeCode Available	3
MoonCast: High-Quality Zero-Shot Podcast Generation	Mar 18, 2025	Speech Synthesistext-to-speech	CodeCode Available	3
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models	Mar 5, 2024	QuantizationSpeech Synthesis	CodeCode Available	3
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation	Jun 15, 2021	Speech Synthesistext-to-speech	CodeCode Available	3
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech	Jul 3, 2022	text-to-speechText to Speech	CodeCode Available	2
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis	Oct 30, 2024	Speech Synthesistext-to-speech	CodeCode Available	2
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness	Apr 10, 2024	Speech Synthesistext-to-speech	CodeCode Available	2
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning	Jun 12, 2024	text-to-speechText to Speech	CodeCode Available	2
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform	Oct 28, 2022	CPUKnowledge Distillation	CodeCode Available	2
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction	Oct 28, 2018	PredictionSpeech Synthesis	CodeCode Available	2
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform	Mar 4, 2022	Speech Synthesistext-to-speech	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 29Next →

No leaderboard results yet.