Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 1419 papers

Title	Date	Tasks	Status	Hype
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech	Dec 16, 2024	text-to-speechText to Speech	CodeCode Available	0
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens	Dec 13, 2024	Conditional Image GenerationImage Generation	—Unverified	0
AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation	Dec 13, 2024	Data AugmentationSarcasm Detection	—Unverified	0
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder	Dec 12, 2024	Audio SynthesisSinging Voice Synthesis	—Unverified	0
A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings	Dec 11, 2024	text-to-speechText to Speech	—Unverified	0
A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction	Dec 11, 2024	DecoderSelf-Supervised Learning	—Unverified	0
LatentSpeech: Latent Diffusion for Text-To-Speech Generation	Dec 11, 2024	text-to-speechText to Speech	—Unverified	0
Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration	Dec 11, 2024	text-to-speechText to Speech	—Unverified	0
Multimodal Latent Language Modeling with Next-Token Diffusion	Dec 11, 2024	Image GenerationLanguage Modeling	CodeCode Available	0
Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey	Dec 9, 2024	Speech SynthesisSurvey	CodeCode Available	3
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations	Dec 9, 2024	text-to-speechText to Speech	—Unverified	0
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles	Dec 4, 2024	Prosody Predictiontext-to-speech	—Unverified	0
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot	Dec 3, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	7
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor	Dec 1, 2024	AllNatural Language Understanding	—Unverified	0
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation	Nov 27, 2024	Question AnsweringSpeech Enhancement	—Unverified	0
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory	Nov 27, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis	Nov 26, 2024	Decodermultimodal generation	—Unverified	0
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM	Nov 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
A Context-Based Numerical Format Prediction for a Text-To-Speech System	Nov 19, 2024	text-to-speechText to Speech	—Unverified	0
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D	Nov 19, 2024	Speech-to-Texttext-to-speech	—Unverified	0
Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation	Nov 19, 2024	text-to-speechText to Speech	—Unverified	0
WavChat: A Survey of Spoken Dialogue Models	Nov 15, 2024	speech-recognitionSpeech Recognition	CodeCode Available	3
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models	Nov 12, 2024	Grapheme-to-Phoneme ConversionRetrieval	—Unverified	0
Debatts: Zero-Shot Debating Text-to-Speech Synthesis	Nov 10, 2024	Speech Synthesistext-to-speech	—Unverified	0
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR	Nov 7, 2024	Language ModellingLarge Language Model	—Unverified	0

Show:10 25 50

← PrevPage 8 of 57Next →

No leaderboard results yet.