SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 76100 of 1419 papers

TitleStatusHype
Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising0
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation0
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching0
Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis0
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese0
BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio DatasetCode0
UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech0
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder0
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications0
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation0
FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech0
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations0
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language ModelCode4
Generating Narrated Lecture Videos from Slides with Synchronized Highlights0
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-PlayCode3
Sadeed: Advancing Arabic Diacritization Through Small Language Model0
Towards Flow-Matching-based TTS without Classifier-Free Guidance0
ClonEval: An Open Voice Cloning BenchmarkCode0
A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models0
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting0
GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM0
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis0
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis0
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation0
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis0
Show:102550
← PrevPage 4 of 57Next →

No leaderboard results yet.