Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 1419 papers

Title	Date	Tasks	Status
GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM	Apr 15, 2025	QuantizationReading Comprehension	—Unverified
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis	Apr 14, 2025	Language ModelingLanguage Modelling	—Unverified
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis	Apr 14, 2025	RAGRetrieval-augmented Generation	—Unverified
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation	Apr 11, 2025	text-to-speechText to Speech	—Unverified
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow	Apr 10, 2025	Speech Synthesistext-to-speech	—Unverified
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis	Apr 10, 2025	Speech Synthesistext-to-speech	—Unverified
SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation	Apr 7, 2025	text-to-speechText to Speech	—Unverified
Speculative End-Turn Detector for Efficient Speech Chatbot Assistant	Mar 30, 2025	ChatbotCollaborative Inference	—Unverified
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System	Mar 29, 2025	Speech Synthesistext-to-speech	—Unverified
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation	Mar 28, 2025	Audio GenerationAudio-Visual Synchronization	—Unverified
Dual Audio-Centric Modality Coupling for Talking Head Generation	Mar 26, 2025	NeRFTalking Head Generation	—Unverified
Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication	Mar 21, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation	Mar 14, 2025	text-to-speechText to Speech	—Unverified
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR	Mar 11, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
VocalEyes: Enhancing Environmental Perception for the Visually Impaired through Vision-Language Models and Distance-Aware Object Detection	Mar 10, 2025	NVIDIA Jetson Orin Nanoobject-detection	—Unverified
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training	Mar 4, 2025	Instruction Followingtext-to-speech	—Unverified
Direct Speech to Speech Translation: A Review	Mar 3, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation	Mar 2, 2025	DecoderRepresentation Learning	—Unverified
Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale	Feb 27, 2025	AI AgentLarge Language Model	—Unverified
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding	Feb 26, 2025	text-to-speechText to Speech	—Unverified
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis	Feb 26, 2025	Speech Synthesistext-to-speech	—Unverified
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision	Feb 26, 2025	Audio SynthesisAutomatic Speech Recognition	—Unverified
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM	Feb 24, 2025	Automatic Speech RecognitionLanguage Modeling	—Unverified
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing	Feb 17, 2025	Lip to Speech Synthesisspeech-recognition	—Unverified
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer	Feb 16, 2025	text-to-speechText to Speech	—Unverified
ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech	Feb 13, 2025	Adversarial AttackAdversarial Attack Detection	—Unverified
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement	Feb 11, 2025	Disentanglementtext-to-speech	—Unverified
LoRP-TTS: Low-Rank Personalized Text-To-Speech	Feb 11, 2025	Speech Synthesistext-to-speech	—Unverified
Synthetic Audio Helps for Cognitive State Tasks	Feb 10, 2025	text-to-speechText to Speech	CodeCode Available
Speech to Speech Translation with Translatotron: A State of the Art Review	Feb 9, 2025	speech-recognitionSpeech Recognition	—Unverified
Gender Bias in Instruction-Guided Speech Synthesis Models	Feb 8, 2025	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech	Feb 5, 2025	Language ModelingLanguage Modelling	—Unverified
Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation	Feb 4, 2025	Change DetectionGender Classification	—Unverified
EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis	Feb 2, 2025	Self-Supervised LearningSSIM	—Unverified
VisualSpeech: Enhance Prosody with Visual Context in TTS	Jan 31, 2025	Prosody Predictiontext-to-speech	—Unverified
BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights	Jan 29, 2025	Language ModelingLanguage Modelling	—Unverified
Compact Neural TTS Voices for Accessibility	Jan 28, 2025	Speech Synthesistext-to-speech	—Unverified
Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models	Jan 24, 2025	Emotion ClassificationSpeaker Identification	—Unverified
Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation	Jan 24, 2025	Audio Deepfake DetectionDeepFake Detection	—Unverified
LoCoML: A Framework for Real-World ML Inference Pipelines	Jan 24, 2025	Automatic Speech RecognitionMachine Translation	—Unverified
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement	Jan 23, 2025	Data AugmentationSpeech Enhancement	—Unverified
Development of an Inclusive Educational Platform Using Open Technologies and Machine Learning: A Case Study on Accessibility Enhancement	Jan 22, 2025	Object Recognitionspeech-recognition	—Unverified
A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data	Jan 21, 2025	Domain Adaptationspeech-recognition	—Unverified
Speech Synthesis along Perceptual Voice Quality Dimensions	Jan 15, 2025	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement	Jan 15, 2025	Computational EfficiencyCPU	—Unverified
AI-Powered Assistive Technologies for Visual Impairment	Jan 14, 2025	Object Recognitiontext-to-speech	—Unverified
MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model	Jan 10, 2025	DecoderLanguage Modelling	—Unverified
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control	Jan 10, 2025	Speech Synthesistext-to-speech	—Unverified
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer	Jan 10, 2025	speech-recognitionSpeech Recognition	—Unverified
Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron	Jan 10, 2025	Speech Synthesistext-to-speech	—Unverified

Show:10 25 50

← PrevPage 8 of 29Next →

No leaderboard results yet.