SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 251300 of 1419 papers

TitleStatusHype
Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling0
Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech0
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
Zero-shot Cross-lingual Voice Transfer for TTS0
On the Feasibility of Fully AI-automated Vishing Attacks0
Preference Alignment Improves Language Model-Based TTS0
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space0
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild0
Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems0
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech0
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference0
Moshi: a speech-text foundation model for real-time dialogueCode9
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives0
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora0
SpMis: An Investigation of Synthetic Spoken Misinformation Detection0
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion0
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization0
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning0
E1 TTS: Simple and Fast Non-Autoregressive TTS0
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation0
SafeEar: Content Privacy-Preserving Audio Deepfake DetectionCode2
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation0
HLTCOE JHU Submission to the Voice Privacy Challenge 20240
Text-To-Speech Synthesis In The Wild0
Full-text Error Correction for Chinese Speech Recognition with Large Language Model0
Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT0
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and SynthesisCode2
D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack0
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment0
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach0
VoiceWukong: Benchmarking Deepfake Voice Detection0
What happens to diffusion model likelihood when your model is conditional?0
AS-Speech: Adaptive Style For Speech Synthesis0
IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSCode2
LAST: Language Model Aware Speech Tokenization0
Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems0
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka0
A Framework for Synthetic Audio Conversations Generation using Large Language Models0
A multilingual training strategy for low resource Text to Speech0
Sample-Efficient Diffusion for Text-To-Speech SynthesisCode2
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec TransformerCode9
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection0
AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge0
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language ModelCode3
Multi-modal Adversarial Training for Zero-Shot Voice Cloning0
Easy, Interpretable, Effective: openSMILE for voice deepfake detection0
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-SpeechCode0
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance0
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models0
Positional Description for Numerical Normalization0
Show:102550
← PrevPage 6 of 29Next →

No leaderboard results yet.