SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 501550 of 1419 papers

TitleStatusHype
An overview of text-to-speech systems and media applications0
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model0
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens0
Explicit Intensity Control for Accented Text-to-speech0
Efficient data selection employing Semantic Similarity-based Graph Structures for model training0
Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles using Machine Learning0
Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems0
Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation0
Exploring Speech Enhancement for Low-resource Speech Synthesis0
Exploring speech style spaces with language models: Emotional TTS without emotion labels0
Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study0
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment0
BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization0
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era0
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech0
Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition0
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation0
Effective Decoder Masking for Transformer Based End-to-End Speech Recognition0
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing0
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions0
Easy, Interpretable, Effective: openSMILE for voice deepfake detection0
E3 TTS: Easy End-to-End Diffusion-based Text to Speech0
A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation0
Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System0
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs0
E1 TTS: Simple and Fast Non-Autoregressive TTS0
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection0
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis0
Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling0
A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples0
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis0
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset0
Dual Supervised Learning0
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance0
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model0
Dual Script E2E framework for Multilingual and Code-Switching ASR0
Dual Audio-Centric Modality Coupling for Talking Head Generation0
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy0
DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction0
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech0
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS20
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes0
LAraBench: Benchmarking Arabic AI with Large Language Models0
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis0
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis0
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech0
Do Prosody Transfer Models Transfer Prosody?0
Does Audio Deepfake Detection Generalize?0
Show:102550
← PrevPage 11 of 29Next →

No leaderboard results yet.