SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 451500 of 1419 papers

TitleStatusHype
Continuous Speech Synthesis using per-token Latent Diffusion0
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-SpeechCode0
A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages0
Enhancing Crowdsourced Audio for Text-to-Speech Models0
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis0
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech0
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation0
ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs0
IsoChronoMeter: A simple and effective isochronic translation evaluation metricCode0
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis0
Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context ModelingCode0
Unsupervised Data Validation Methods for Efficient Model Training0
Can DeepFake Speech be Reliably Detected?0
Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch0
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS0
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech0
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis0
Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System0
Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens0
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech0
Generative Semantic Communication for Text-to-Speech Synthesis0
Augmentation through Laundering Attacks for Audio Spoof Detection0
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS0
Word-wise intonation model for cross-language TTS systems0
FluentEditor2: Text-based Speech Editing by Modeling Multi-Scale Acoustic and Prosody ConsistencyCode0
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control0
Exploring synthetic data for cross-speaker style transfer in style representation based TTS0
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions0
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis0
Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling0
Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech0
On the Feasibility of Fully AI-automated Vishing Attacks0
Zero-shot Cross-lingual Voice Transfer for TTS0
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space0
Preference Alignment Improves Language Model-Based TTS0
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference0
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech0
Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems0
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild0
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives0
SpMis: An Investigation of Synthetic Spoken Misinformation Detection0
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora0
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization0
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion0
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning0
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation0
E1 TTS: Simple and Fast Non-Autoregressive TTS0
Text-To-Speech Synthesis In The Wild0
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation0
HLTCOE JHU Submission to the Voice Privacy Challenge 20240
Show:102550
← PrevPage 10 of 29Next →

No leaderboard results yet.