SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 151200 of 1419 papers

TitleStatusHype
FastPitchFormant: Source-filter based Decomposed Modeling for Speech SynthesisCode1
Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent DictionariesCode1
Multi-Task Learning for Front-End Text Processing in TTSCode1
A Toolbox for Construction and Analysis of Speech DatasetsCode1
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech SynthesisCode1
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech UnderstandingCode1
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWPCode1
AdaSpeech: Adaptive Text to Speech for Custom VoiceCode1
End to End Lip Synchronization with a Temporal AutoEncoderCode1
An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice TransformerCode1
End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition ModelCode1
MultiSpeech: Multi-Speaker Text to Speech with TransformerCode1
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied BaselineCode1
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to SpeechCode1
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-SpeechCode1
EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novelsCode1
An Efficient Membership Inference Attack for the Diffusion Model by Proximal InitializationCode1
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal PromptsCode1
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis DatasetCode1
AudioMarkBench: Benchmarking Robustness of Audio WatermarkingCode1
AdaSpeech 2: Adaptive Text to Speech with Untranscribed DataCode1
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language ModelsCode1
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided AttentionCode1
Emotion-Aware Prosodic Phrasing for Expressive Text-to-SpeechCode1
EfficientSpeech: An On-Device Text to Speech ModelCode1
EdiTTS: Score-based Editing for Controllable Text-to-SpeechCode1
Effective Deep Learning Models for Automatic Diacritization of Arabic TextCode1
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style ConversionCode1
Accented Text-to-Speech Synthesis with a Conditional Variational AutoencoderCode1
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnetCode1
Automatic Prosody Annotation with Pre-Trained Text-Speech ModelCode1
Evaluating Speech Synthesis by Training Recognizers on Synthetic SpeechCode1
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional FusionCode1
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTSCode1
End-to-End Adversarial Text-to-SpeechCode1
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddingsCode1
Multilingual Text-to-Speech Synthesis for Turkic Languages Using TransliterationCode1
Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found DataCode1
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-SpeechCode1
FastPitch: Parallel Text-to-speech with Pitch PredictionCode1
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length EmbeddingCode1
Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language TextCode1
Dreamento: an open-source dream engineering toolbox for sleep EEG wearablesCode1
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech DetectionCode1
MathReader : Text-to-Speech for Mathematical DocumentsCode1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech RecognitionCode1
BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithmCode1
Attention model for articulatory features detectionCode1
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS AdaptationCode1
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingCode1
Show:102550
← PrevPage 4 of 29Next →

No leaderboard results yet.