SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 151200 of 1419 papers

TitleStatusHype
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement0
AI-Powered Assistive Technologies for Visual Impairment0
MathReader : Text-to-Speech for Mathematical DocumentsCode1
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control0
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer0
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction0
Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron0
MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model0
Probing Speaker-specific Features in Speaker Representations0
Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model0
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles0
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT0
RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented TransformerCode2
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting0
"I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen EntitiesCode0
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset0
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis0
Autoregressive Speech Synthesis with Next-Distribution Prediction0
Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective0
Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers0
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling0
Enhancing Naturalness in LLM-Generated Utterances through Disfluency Insertion0
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes0
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis0
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-SpeechCode0
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens0
AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation0
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder0
A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings0
A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction0
LatentSpeech: Latent Diffusion for Text-To-Speech Generation0
Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration0
Multimodal Latent Language Modeling with Next-Token DiffusionCode0
Towards Controllable Speech Synthesis in the Era of Large Language Models: A SurveyCode3
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations0
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles0
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken ChatbotCode7
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor0
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation0
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory0
Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis0
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM0
A Context-Based Numerical Format Prediction for a Text-To-Speech System0
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D0
Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation0
WavChat: A Survey of Spoken Dialogue ModelsCode3
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models0
Debatts: Zero-Shot Debating Text-to-Speech Synthesis0
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR0
Show:102550
← PrevPage 4 of 29Next →

No leaderboard results yet.