Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 1419 papers

Title	Date	Tasks	Status	Hype
Ming-Omni: A Unified Multimodal Model for Perception and Generation	Jun 11, 2025	Image Generationtext-to-speech	CodeCode Available	4
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model	May 6, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	4
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining	Aug 10, 2023	Audio GenerationIn-Context Learning	CodeCode Available	4
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert	Apr 18, 2023	Audio GenerationExpressive Speech Synthesis	CodeCode Available	4
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge	May 29, 2025	text-to-speechText to Speech	CodeCode Available	3
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play	May 5, 2025	AI AgentAutomatic Speech Recognition	CodeCode Available	3
MoonCast: High-Quality Zero-Shot Podcast Generation	Mar 18, 2025	Speech Synthesistext-to-speech	CodeCode Available	3
Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey	Dec 9, 2024	Speech SynthesisSurvey	CodeCode Available	3
WavChat: A Survey of Spoken Dialogue Models	Nov 15, 2024	speech-recognitionSpeech Recognition	CodeCode Available	3
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model	Aug 30, 2024	Audio CompressionAudio Generation	CodeCode Available	3
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation	Aug 14, 2024	Speech Synthesistext-to-speech	CodeCode Available	3
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control	Jun 3, 2024	Speech Synthesistext-to-speech	CodeCode Available	3
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models	Mar 5, 2024	QuantizationSpeech Synthesis	CodeCode Available	3
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis	Nov 21, 2023	Speech SynthesisSuper-Resolution	CodeCode Available	3
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech	Jul 13, 2022	DenoisingGPU	CodeCode Available	3
SoundStream: An End-to-End Neural Audio Codec	Jul 7, 2021	CPUDecoder	CodeCode Available	3
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation	Jun 15, 2021	Speech Synthesistext-to-speech	CodeCode Available	3
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning	Jul 9, 2019	Speech Synthesistext-to-speech	CodeCode Available	3
Differentiable Reward Optimization for LLM based TTS system	Jul 8, 2025	text-to-speechText to Speech	CodeCode Available	2
PresentAgent: Multimodal Agent for Presentation Video Generation	Jul 5, 2025	text-to-speechText to Speech	CodeCode Available	2
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching	Jun 20, 2025	SchedulingSpeech Synthesis	CodeCode Available	2
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment	May 26, 2025	text-to-speechText to Speech	CodeCode Available	2
RWKVTTS: Yet another TTS based on RWKV-7	Apr 4, 2025	Computational Efficiencytext-to-speech	CodeCode Available	2
TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection	Mar 31, 2025	Fraud DetectionLarge Language Model	CodeCode Available	2
Scaling Rich Style-Prompted Text-to-Speech Datasets	Mar 6, 2025	Language ModelingLanguage Modelling	CodeCode Available	2

Show:10 25 50

← PrevPage 2 of 57Next →

No leaderboard results yet.