SOTAVerified|Agents Browse Leaderboard About

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 1419 papers

Title	Date	Tasks	Status	Hype
Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising	May 20, 2025	DecoderDenoising	—Unverified	0
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation	May 20, 2025	Dataset GenerationSpeech Synthesis	—Unverified	0
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching	May 19, 2025	AttributeSpeech Synthesis	—Unverified	0
Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis	May 18, 2025	Speech Synthesistext-to-speech	—Unverified	0
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese	May 16, 2025	BenchmarkingLanguage Modeling	—Unverified	0
BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset	May 16, 2025	DeepFake DetectionFace Swapping	CodeCode Available	0
UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech	May 15, 2025	Emotional Speech SynthesisLanguage Modeling	—Unverified	0
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder	May 12, 2025	text-to-speechText to Speech	—Unverified	0
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications	May 12, 2025	Speech Synthesistext-to-speech	—Unverified	0
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation	May 10, 2025	Grapheme-to-Phoneme ConversionLarge Language Model	—Unverified	0
FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech	May 8, 2025	Style Transfertext-to-speech	—Unverified	0
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations	May 8, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model	May 6, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	4
Generating Narrated Lecture Videos from Slides with Synchronized Highlights	May 5, 2025	Mathtext-to-speech	—Unverified	0
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play	May 5, 2025	AI AgentAutomatic Speech Recognition	CodeCode Available	3
Sadeed: Advancing Arabic Diacritization Through Small Language Model	Apr 30, 2025	Arabic Text DiacritizationBenchmarking	—Unverified	0
Towards Flow-Matching-based TTS without Classifier-Free Guidance	Apr 29, 2025	Speech Synthesistext-to-speech	—Unverified	0
ClonEval: An Open Voice Cloning Benchmark	Apr 29, 2025	text-to-speechText to Speech	CodeCode Available	0
A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models	Apr 22, 2025	cross-modal alignmentScript Generation	—Unverified	0
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting	Apr 17, 2025	text-to-speechText to Speech	—Unverified	0
GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM	Apr 15, 2025	QuantizationReading Comprehension	—Unverified	0
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis	Apr 14, 2025	Language ModelingLanguage Modelling	—Unverified	0
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis	Apr 14, 2025	RAGRetrieval-augmented Generation	—Unverified	0
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation	Apr 11, 2025	text-to-speechText to Speech	—Unverified	0
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis	Apr 10, 2025	Speech Synthesistext-to-speech	—Unverified	0

Show:10 25 50

← PrevPage 4 of 57Next →

No leaderboard results yet.