Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 1419 papers

Title	Date	Tasks	Status	Hype
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement	Jan 15, 2025	Computational EfficiencyCPU	—Unverified	0
AI-Powered Assistive Technologies for Visual Impairment	Jan 14, 2025	Object Recognitiontext-to-speech	—Unverified	0
MathReader : Text-to-Speech for Mathematical Documents	Jan 13, 2025	Optical Character Recognition (OCR)text-to-speech	CodeCode Available	1
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control	Jan 10, 2025	Speech Synthesistext-to-speech	—Unverified	0
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer	Jan 10, 2025	speech-recognitionSpeech Recognition	—Unverified	0
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction	Jan 10, 2025	Instruction FollowingLanguage Modeling	—Unverified	0
Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron	Jan 10, 2025	Speech Synthesistext-to-speech	—Unverified	0
MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model	Jan 10, 2025	DecoderLanguage Modelling	—Unverified	0
Probing Speaker-specific Features in Speaker Representations	Jan 9, 2025	Self-Supervised LearningSpeaker Verification	—Unverified	0
Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model	Jan 8, 2025	text-to-speechText to Speech	—Unverified	0
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles	Jan 2, 2025	Speech Synthesistext-to-speech	—Unverified	0
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT	Jan 2, 2025	Polyphone disambiguationSentence	—Unverified	0
RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer	Jan 2, 2025	Audio Generationtext-to-speech	CodeCode Available	2
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting	Dec 28, 2024	Speech Synthesistext-to-speech	—Unverified	0
"I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities	Dec 26, 2024	Domain AdaptationLanguage Modeling	CodeCode Available	0
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID	Dec 26, 2024	Language Identificationtext-to-speech	—Unverified	0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset	Dec 25, 2024	text-to-speechText to Speech	—Unverified	0
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis	Dec 22, 2024	DecoderDisentanglement	—Unverified	0
Autoregressive Speech Synthesis with Next-Distribution Prediction	Dec 22, 2024	Language ModelingLanguage Modelling	—Unverified	0
Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective	Dec 22, 2024	text-to-speechText to Speech	—Unverified	0
Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers	Dec 20, 2024	Language ModelingLanguage Modelling	—Unverified	0
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling	Dec 19, 2024	AttributeSpeech Enhancement	—Unverified	0
Enhancing Naturalness in LLM-Generated Utterances through Disfluency Insertion	Dec 17, 2024	text-to-speechText to Speech	—Unverified	0
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes	Dec 17, 2024	DeepFake DetectionFace Swapping	—Unverified	0
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis	Dec 16, 2024	Speech Synthesistext-to-speech	—Unverified	0
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech	Dec 16, 2024	text-to-speechText to Speech	CodeCode Available	0
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens	Dec 13, 2024	Conditional Image GenerationImage Generation	—Unverified	0
AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation	Dec 13, 2024	Data AugmentationSarcasm Detection	—Unverified	0
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder	Dec 12, 2024	Audio SynthesisSinging Voice Synthesis	—Unverified	0
A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings	Dec 11, 2024	text-to-speechText to Speech	—Unverified	0
A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction	Dec 11, 2024	DecoderSelf-Supervised Learning	—Unverified	0
LatentSpeech: Latent Diffusion for Text-To-Speech Generation	Dec 11, 2024	text-to-speechText to Speech	—Unverified	0
Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration	Dec 11, 2024	text-to-speechText to Speech	—Unverified	0
Multimodal Latent Language Modeling with Next-Token Diffusion	Dec 11, 2024	Image GenerationLanguage Modeling	CodeCode Available	0
Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey	Dec 9, 2024	Speech SynthesisSurvey	CodeCode Available	3
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations	Dec 9, 2024	text-to-speechText to Speech	—Unverified	0
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles	Dec 4, 2024	Prosody Predictiontext-to-speech	—Unverified	0
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot	Dec 3, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	7
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor	Dec 1, 2024	AllNatural Language Understanding	—Unverified	0
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation	Nov 27, 2024	Question AnsweringSpeech Enhancement	—Unverified	0
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory	Nov 27, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis	Nov 26, 2024	Decodermultimodal generation	—Unverified	0
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM	Nov 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
A Context-Based Numerical Format Prediction for a Text-To-Speech System	Nov 19, 2024	text-to-speechText to Speech	—Unverified	0
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D	Nov 19, 2024	Speech-to-Texttext-to-speech	—Unverified	0
Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation	Nov 19, 2024	text-to-speechText to Speech	—Unverified	0
WavChat: A Survey of Spoken Dialogue Models	Nov 15, 2024	speech-recognitionSpeech Recognition	CodeCode Available	3
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models	Nov 12, 2024	Grapheme-to-Phoneme ConversionRetrieval	—Unverified	0
Debatts: Zero-Shot Debating Text-to-Speech Synthesis	Nov 10, 2024	Speech Synthesistext-to-speech	—Unverified	0
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR	Nov 7, 2024	Language ModellingLarge Language Model	—Unverified	0

Show:10 25 50

← PrevPage 4 of 29Next →

No leaderboard results yet.