SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 13511400 of 1419 papers

TitleStatusHype
Does Audio Deepfake Detection Generalize?0
Do Prosody Transfer Models Transfer Prosody?0
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech0
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes0
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech0
DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction0
Dual Audio-Centric Modality Coupling for Talking Head Generation0
Dual Script E2E framework for Multilingual and Code-Switching ASR0
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance0
Dual Supervised Learning0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing0
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech0
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis0
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis0
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection0
E1 TTS: Simple and Fast Non-Autoregressive TTS0
E3 TTS: Easy End-to-End Diffusion-based Text to Speech0
Easy, Interpretable, Effective: openSMILE for voice deepfake detection0
Effective Decoder Masking for Transformer Based End-to-End Speech Recognition0
Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition0
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment0
Efficient data selection employing Semantic Similarity-based Graph Structures for model training0
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens0
Efficient Incremental Text-to-Speech on GPUs0
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS0
Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch0
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams0
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering0
EmoCat: Language-agnostic Emotional Voice Conversion0
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance0
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization0
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations0
EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis0
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions0
Emotional Prosody Control for Speech Generation0
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition0
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model0
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting0
Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems0
Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems0
Emphasis control for parallel neural TTS0
Emphasized Accent Phrase Prediction from Text for Advertisement Text-To-Speech Synthesis0
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition0
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis0
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech0
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator0
End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec20
End-to-end speech recognition modeling from de-identified data0
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue0
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning0
Show:102550
← PrevPage 28 of 29Next →

No leaderboard results yet.