Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 1419 papers

Title	Date	Tasks	Status	Hype
Hear Your Code Fail, Voice-Assisted Debugging for Python	Jul 20, 2025	CPUMedical Diagnosis	—Unverified	0
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech	Jul 17, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge	Jul 15, 2025	Speech Enhancementtext-to-speech	—Unverified	0
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments	Jul 14, 2025	Speech-to-Texttext-to-speech	—Unverified	0
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching	Jul 12, 2025	Dialogue Generationtext-to-speech	CodeCode Available	4
Exploiting Leaderboards for Large-Scale Distribution of Malicious Models	Jul 11, 2025	Model DiscoveryText Generation	—Unverified	0
MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling	Jul 11, 2025	Audio SynthesisLanguage Modelling	—Unverified	0
Differentiable Reward Optimization for LLM based TTS system	Jul 8, 2025	text-to-speechText to Speech	CodeCode Available	2
Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis	Jul 8, 2025	Data AugmentationMixture-of-Experts	—Unverified	0
PresentAgent: Multimodal Agent for Presentation Video Generation	Jul 5, 2025	text-to-speechText to Speech	CodeCode Available	2
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS	Jun 25, 2025	Speaker Recognitiontext-to-speech	—Unverified	0
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems	Jun 24, 2025	text-to-speechText to Speech	—Unverified	0
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching	Jun 20, 2025	SchedulingSpeech Synthesis	CodeCode Available	2
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization	Jun 20, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Optimizing Multilingual Text-To-Speech with Accents & Emotions	Jun 19, 2025	DisentanglementEmotion Recognition	—Unverified	0
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement	Jun 19, 2025	text-to-speechText to Speech	—Unverified	0
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction	Jun 18, 2025	Sentencetext-to-speech	—Unverified	0
EmoNews: A Spoken Dialogue System for Expressive News Conversations	Jun 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching	Jun 16, 2025	DecoderSpeech Synthesis	CodeCode Available	4
Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech	Jun 14, 2025	Grapheme-to-Phoneme Conversiontext-to-speech	—Unverified	0
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling	Jun 14, 2025	text-to-speechText to Speech	—Unverified	0
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs	Jun 12, 2025	Speech-to-Speech Translationtext-to-speech	—Unverified	0
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation	Jun 11, 2025	Reading ComprehensionSpeech Synthesis	—Unverified	0
Ming-Omni: A Unified Multimodal Model for Perception and Generation	Jun 11, 2025	Image Generationtext-to-speech	CodeCode Available	4
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching	Jun 11, 2025	Speech Synthesistext-to-speech	—Unverified	0
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions	Jun 10, 2025	text-to-speechText to Speech	CodeCode Available	1
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data	Jun 10, 2025	text-to-speechText to Speech	—Unverified	0
Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation	Jun 9, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Seeing Voices: Generating A-Roll Video from Audio with Mirage	Jun 9, 2025	Speech Synthesistext-to-speech	—Unverified	0
Voice Impression Control in Zero-Shot TTS	Jun 6, 2025	Language ModelingLanguage Modelling	—Unverified	0
Intelligibility of Text-to-Speech Systems for Mathematical Expressions	Jun 5, 2025	text-to-speechText to Speech	—Unverified	0
Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning	Jun 5, 2025	text-to-speechText to Speech	—Unverified	0
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset	Jun 4, 2025	Speech Synthesistext-to-speech	—Unverified	0
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?	Jun 4, 2025	text-to-speechText to Speech	—Unverified	0
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions	Jun 4, 2025	Data AugmentationDiversity	—Unverified	0
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing	Jun 4, 2025	Quantizationtext-to-speech	—Unverified	0
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation	Jun 4, 2025	cross-modal alignmentLipreading	—Unverified	0
Towards a Japanese Full-duplex Spoken Dialogue System	Jun 3, 2025	Spoken Dialogue Systemstext-to-speech	—Unverified	0
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech	Jun 3, 2025	Speech Synthesistext-to-speech	—Unverified	0
Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions	Jun 3, 2025	Expressive Speech SynthesisPrompt Learning	—Unverified	0
Zero-Shot Text-to-Speech for Vietnamese	Jun 2, 2025	text-to-speechText to Speech	—Unverified	0
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction	Jun 2, 2025	Speech Synthesistext-to-speech	—Unverified	0
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing	Jun 2, 2025	Keyword Spottingspeech-recognition	—Unverified	0
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models	Jun 1, 2025	counterfactualSpeech Synthesis	—Unverified	0
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems	May 31, 2025	Language ModelingLanguage Modelling	—Unverified	0
Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement	May 30, 2025	text-to-speechText to Speech	—Unverified	0
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation	May 30, 2025	Language ModelingLanguage Modelling	—Unverified	0
Can Emotion Fool Anti-spoofing?	May 29, 2025	Emotion RecognitionSpeech Emotion Recognition	—Unverified	0
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting	May 29, 2025	Keyword Spottingtext-to-speech	—Unverified	0

Show:10 25 50

← PrevPage 1 of 29Next →

No leaderboard results yet.