Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 1419 papers

Title	Date	Tasks	Status
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer	Jan 10, 2025	speech-recognitionSpeech Recognition	—Unverified
Probing Speaker-specific Features in Speaker Representations	Jan 9, 2025	Self-Supervised LearningSpeaker Verification	—Unverified
Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model	Jan 8, 2025	text-to-speechText to Speech	—Unverified
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles	Jan 2, 2025	Speech Synthesistext-to-speech	—Unverified
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT	Jan 2, 2025	Polyphone disambiguationSentence	—Unverified
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting	Dec 28, 2024	Speech Synthesistext-to-speech	—Unverified
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID	Dec 26, 2024	Language Identificationtext-to-speech	—Unverified
"I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities	Dec 26, 2024	Domain AdaptationLanguage Modeling	CodeCode Available
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset	Dec 25, 2024	text-to-speechText to Speech	—Unverified
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis	Dec 22, 2024	DecoderDisentanglement	—Unverified
Autoregressive Speech Synthesis with Next-Distribution Prediction	Dec 22, 2024	Language ModelingLanguage Modelling	—Unverified
Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective	Dec 22, 2024	text-to-speechText to Speech	—Unverified
Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers	Dec 20, 2024	Language ModelingLanguage Modelling	—Unverified
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling	Dec 19, 2024	AttributeSpeech Enhancement	—Unverified
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes	Dec 17, 2024	DeepFake DetectionFace Swapping	—Unverified
Enhancing Naturalness in LLM-Generated Utterances through Disfluency Insertion	Dec 17, 2024	text-to-speechText to Speech	—Unverified
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis	Dec 16, 2024	Speech Synthesistext-to-speech	—Unverified
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech	Dec 16, 2024	text-to-speechText to Speech	CodeCode Available
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens	Dec 13, 2024	Conditional Image GenerationImage Generation	—Unverified
AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation	Dec 13, 2024	Data AugmentationSarcasm Detection	—Unverified
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder	Dec 12, 2024	Audio SynthesisSinging Voice Synthesis	—Unverified
A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings	Dec 11, 2024	text-to-speechText to Speech	—Unverified
Multimodal Latent Language Modeling with Next-Token Diffusion	Dec 11, 2024	Image GenerationLanguage Modeling	CodeCode Available
A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction	Dec 11, 2024	DecoderSelf-Supervised Learning	—Unverified
LatentSpeech: Latent Diffusion for Text-To-Speech Generation	Dec 11, 2024	text-to-speechText to Speech	—Unverified
Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration	Dec 11, 2024	text-to-speechText to Speech	—Unverified
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations	Dec 9, 2024	text-to-speechText to Speech	—Unverified
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles	Dec 4, 2024	Prosody Predictiontext-to-speech	—Unverified
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor	Dec 1, 2024	AllNatural Language Understanding	—Unverified
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation	Nov 27, 2024	Question AnsweringSpeech Enhancement	—Unverified
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory	Nov 27, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis	Nov 26, 2024	Decodermultimodal generation	—Unverified
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM	Nov 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Context-Based Numerical Format Prediction for a Text-To-Speech System	Nov 19, 2024	text-to-speechText to Speech	—Unverified
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D	Nov 19, 2024	Speech-to-Texttext-to-speech	—Unverified
Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation	Nov 19, 2024	text-to-speechText to Speech	—Unverified
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models	Nov 12, 2024	Grapheme-to-Phoneme ConversionRetrieval	—Unverified
Debatts: Zero-Shot Debating Text-to-Speech Synthesis	Nov 10, 2024	Speech Synthesistext-to-speech	—Unverified
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR	Nov 7, 2024	Language ModellingLarge Language Model	—Unverified
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?	Oct 31, 2024	Rhythmspeech-recognition	—Unverified
Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech	Oct 29, 2024	Decodertext-to-speech	CodeCode Available
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding	Oct 29, 2024	Speech Synthesistext-to-speech	—Unverified
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis	Oct 29, 2024	DenoisingSinging Voice Synthesis	—Unverified
Asynchronous Tool Usage for Real-Time Agents	Oct 28, 2024	Automatic Speech Recognitionspeech-recognition	—Unverified
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation	Oct 27, 2024	parameter-efficient fine-tuningQuestion Answering	—Unverified
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis	Oct 24, 2024	Speech Synthesistext-to-speech	—Unverified
Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts	Oct 24, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams	Oct 23, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap	Oct 22, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Continuous Speech Tokenizer in Text To Speech	Oct 22, 2024	Language ModelingLanguage Modelling	CodeCode Available

Show:10 25 50

← PrevPage 9 of 29Next →

No leaderboard results yet.