SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 101150 of 1419 papers

TitleStatusHype
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow0
SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation0
RWKVTTS: Yet another TTS based on RWKV-7Code2
TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud DetectionCode2
Speculative End-Turn Detector for Efficient Speech Chatbot Assistant0
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System0
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation0
Dual Audio-Centric Modality Coupling for Talking Head Generation0
Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication0
MoonCast: High-Quality Zero-Shot Podcast GenerationCode3
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation0
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR0
VocalEyes: Enhancing Environmental Perception for the Visually Impaired through Vision-Language Models and Distance-Aware Object Detection0
Scaling Rich Style-Prompted Text-to-Speech DatasetsCode2
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training0
Direct Speech to Speech Translation: A Review0
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation0
Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale0
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision0
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis0
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding0
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM0
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing0
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer0
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-InstrumentCode2
ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech0
LoRP-TTS: Low-Rank Personalized Text-To-Speech0
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement0
Synthetic Audio Helps for Cognitive State TasksCode0
Speech to Speech Translation with Translatotron: A State of the Art Review0
Gender Bias in Instruction-Guided Speech Synthesis Models0
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution ShiftsCode1
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemCode11
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech0
Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation0
Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and MaliseetCode1
EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis0
VisualSpeech: Enhance Prosody with Visual Context in TTS0
BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights0
Compact Neural TTS Voices for Accessibility0
Overview of the Amphion Toolkit (v0.2)Code9
Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation0
Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models0
LoCoML: A Framework for Real-World ML Inference Pipelines0
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement0
Development of an Inclusive Educational Platform Using Open Technologies and Machine Learning: A Case Study on Accessibility Enhancement0
A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data0
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement0
Show:102550
← PrevPage 3 of 29Next →

No leaderboard results yet.