SOTAVerified|Agents Browse Leaderboard About Blog

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 326–350 of 1419 papers

Title	Date	Tasks	Status
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS2	May 22, 2025	BenchmarkingDialogue Generation	—Unverified
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling	May 21, 2025	Emotion RecognitionFace Detection	—Unverified
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information	May 21, 2025	Language ModelingLanguage Modelling	—Unverified
Voicing Personas: Rewriting Persona Descriptions into Style Prompts for Controllable Text-to-Speech	May 21, 2025	text-to-speechText to Speech	—Unverified
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation	May 20, 2025	Dataset GenerationSpeech Synthesis	—Unverified
Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising	May 20, 2025	DecoderDenoising	—Unverified
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English	May 20, 2025	Automatic Speech Recognitionspeech-recognition	—Unverified
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models	May 20, 2025	text-to-speechText to Speech	—Unverified
SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement	May 20, 2025	text-to-speechText to Speech	—Unverified
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching	May 19, 2025	AttributeSpeech Synthesis	—Unverified
Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis	May 18, 2025	Speech Synthesistext-to-speech	—Unverified
BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset	May 16, 2025	DeepFake DetectionFace Swapping	CodeCode Available
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese	May 16, 2025	BenchmarkingLanguage Modeling	—Unverified
UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech	May 15, 2025	Emotional Speech SynthesisLanguage Modeling	—Unverified
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder	May 12, 2025	text-to-speechText to Speech	—Unverified
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications	May 12, 2025	Speech Synthesistext-to-speech	—Unverified
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation	May 10, 2025	Grapheme-to-Phoneme ConversionLarge Language Model	—Unverified
FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech	May 8, 2025	Style Transfertext-to-speech	—Unverified
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations	May 8, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Generating Narrated Lecture Videos from Slides with Synchronized Highlights	May 5, 2025	Mathtext-to-speech	—Unverified
Sadeed: Advancing Arabic Diacritization Through Small Language Model	Apr 30, 2025	Arabic Text DiacritizationBenchmarking	—Unverified
Towards Flow-Matching-based TTS without Classifier-Free Guidance	Apr 29, 2025	Speech Synthesistext-to-speech	—Unverified
ClonEval: An Open Voice Cloning Benchmark	Apr 29, 2025	text-to-speechText to Speech	CodeCode Available
A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models	Apr 22, 2025	cross-modal alignmentScript Generation	—Unverified
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting	Apr 17, 2025	text-to-speechText to Speech	—Unverified

Show:10 25 50

← PrevPage 14 of 57Next →

No leaderboard results yet.