SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 201225 of 1419 papers

TitleStatusHype
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical VectorCode2
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?0
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis0
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding0
Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-SpeechCode0
Audio Deepfake Detection with Self-Supervised XLS-R and SLS ClassifierCode2
Asynchronous Tool Usage for Real-Time Agents0
Mitigating Unauthorized Speech Synthesis for Voice ProtectionCode1
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation0
Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts0
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis0
STTATTS: Unified Speech-To-Text And Text-To-Speech ModelCode1
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams0
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap0
Continuous Speech Tokenizer in Text To SpeechCode0
Continuous Speech Synthesis using per-token Latent Diffusion0
A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages0
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-SpeechCode0
Enhancing Crowdsourced Audio for Text-to-Speech Models0
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis0
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech0
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation0
ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs0
IsoChronoMeter: A simple and effective isochronic translation evaluation metricCode0
Show:102550
← PrevPage 9 of 57Next →

No leaderboard results yet.