Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 1419 papers

Title	Date	Tasks	Status
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation	Jun 4, 2025	cross-modal alignmentLipreading	—Unverified
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech	Jun 3, 2025	Speech Synthesistext-to-speech	—Unverified
Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions	Jun 3, 2025	Expressive Speech SynthesisPrompt Learning	—Unverified
Towards a Japanese Full-duplex Spoken Dialogue System	Jun 3, 2025	Spoken Dialogue Systemstext-to-speech	—Unverified
Zero-Shot Text-to-Speech for Vietnamese	Jun 2, 2025	text-to-speechText to Speech	—Unverified
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing	Jun 2, 2025	Keyword Spottingspeech-recognition	—Unverified
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction	Jun 2, 2025	Speech Synthesistext-to-speech	—Unverified
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models	Jun 1, 2025	counterfactualSpeech Synthesis	—Unverified
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems	May 31, 2025	Language ModelingLanguage Modelling	—Unverified
Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement	May 30, 2025	text-to-speechText to Speech	—Unverified
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation	May 30, 2025	Language ModelingLanguage Modelling	—Unverified
Can Emotion Fool Anti-spoofing?	May 29, 2025	Emotion RecognitionSpeech Emotion Recognition	—Unverified
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting	May 29, 2025	Keyword Spottingtext-to-speech	—Unverified
Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes	May 29, 2025	Audio Deepfake DetectionDeepFake Detection	CodeCode Available
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech	May 27, 2025	Style Transfertext-to-speech	—Unverified
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech	May 26, 2025	AttributeEmotional Speech Synthesis	—Unverified
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling	May 26, 2025	SentenceSpeech Synthesis	—Unverified
KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization	May 26, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling	May 26, 2025	GPUtext-to-speech	—Unverified
SpeakStream: Streaming Text-to-Speech with Interleaved Data	May 25, 2025	Decodertext-to-speech	—Unverified
Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis	May 25, 2025	Speech Synthesistext-to-speech	—Unverified
CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning	May 25, 2025	text-to-speechText to Speech	—Unverified
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt	May 24, 2025	text-to-speechText to Speech	—Unverified
RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations	May 24, 2025	Expressive Speech SynthesisSpeech Synthesis	—Unverified
What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection	May 23, 2025	Face SwappingSensitivity	—Unverified
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS2	May 22, 2025	BenchmarkingDialogue Generation	—Unverified
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling	May 21, 2025	Emotion RecognitionFace Detection	—Unverified
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information	May 21, 2025	Language ModelingLanguage Modelling	—Unverified
Voicing Personas: Rewriting Persona Descriptions into Style Prompts for Controllable Text-to-Speech	May 21, 2025	text-to-speechText to Speech	—Unverified
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation	May 20, 2025	Dataset GenerationSpeech Synthesis	—Unverified
Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising	May 20, 2025	DecoderDenoising	—Unverified
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English	May 20, 2025	Automatic Speech Recognitionspeech-recognition	—Unverified
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models	May 20, 2025	text-to-speechText to Speech	—Unverified
SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement	May 20, 2025	text-to-speechText to Speech	—Unverified
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching	May 19, 2025	AttributeSpeech Synthesis	—Unverified
Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis	May 18, 2025	Speech Synthesistext-to-speech	—Unverified
BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset	May 16, 2025	DeepFake DetectionFace Swapping	CodeCode Available
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese	May 16, 2025	BenchmarkingLanguage Modeling	—Unverified
UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech	May 15, 2025	Emotional Speech SynthesisLanguage Modeling	—Unverified
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder	May 12, 2025	text-to-speechText to Speech	—Unverified
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications	May 12, 2025	Speech Synthesistext-to-speech	—Unverified
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation	May 10, 2025	Grapheme-to-Phoneme ConversionLarge Language Model	—Unverified
FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech	May 8, 2025	Style Transfertext-to-speech	—Unverified
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations	May 8, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Generating Narrated Lecture Videos from Slides with Synchronized Highlights	May 5, 2025	Mathtext-to-speech	—Unverified
Sadeed: Advancing Arabic Diacritization Through Small Language Model	Apr 30, 2025	Arabic Text DiacritizationBenchmarking	—Unverified
Towards Flow-Matching-based TTS without Classifier-Free Guidance	Apr 29, 2025	Speech Synthesistext-to-speech	—Unverified
ClonEval: An Open Voice Cloning Benchmark	Apr 29, 2025	text-to-speechText to Speech	CodeCode Available
A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models	Apr 22, 2025	cross-modal alignmentScript Generation	—Unverified
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting	Apr 17, 2025	text-to-speechText to Speech	—Unverified

Show:10 25 50

← PrevPage 7 of 29Next →

No leaderboard results yet.