SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 551600 of 1419 papers

TitleStatusHype
SeamlessM4T: Massively Multilingual & Multimodal Machine TranslationCode2
Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models0
AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis0
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer0
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head GenerationCode0
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingCode4
Towards an AI to Win Ghana's National Science and Maths QuizCode1
Let's Give a Voice to Conversational Agents in Virtual RealityCode0
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech TranslationCode1
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis0
Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings0
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture DesignCode2
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingCode1
Multilingual context-based pronunciation learning for Text-to-Speech0
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech0
Improving TTS for Shanghainese: Addressing Tone Sandhi via Word SegmentationCode1
METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer0
ÌròyìnSpeech: A multi-purpose Yorùbá Speech CorpusCode1
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding0
SC VALL-E: Style-Controllable Zero-Shot Text to Speech SynthesizerCode1
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs0
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis0
Controllable Emphasis with zero data for text-to-speech0
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis0
Artificial Eye for the Blind0
Text + Sketch: Image Compression at Ultra Low RatesCode1
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading0
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units0
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to SpeechCode1
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech0
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech0
Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleCode0
Visual-Aware Text-to-Speech0
Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer0
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation0
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages0
Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and ProspectsCode1
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation0
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsCode5
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling0
Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech0
VIFS: An End-to-End Variational Inference for Foley Sound SynthesisCode0
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis0
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias0
Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model0
Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic ProgrammingCode0
Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis0
Towards Robust FastSpeech 2 by Modelling Residual Multimodality0
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech0
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-SpeechCode5
Show:102550
← PrevPage 12 of 29Next →

No leaderboard results yet.