SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 501550 of 1419 papers

TitleStatusHype
Full-text Error Correction for Chinese Speech Recognition with Large Language Model0
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment0
D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack0
Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT0
VoiceWukong: Benchmarking Deepfake Voice Detection0
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach0
What happens to diffusion model likelihood when your model is conditional?0
AS-Speech: Adaptive Style For Speech Synthesis0
LAST: Language Model Aware Speech Tokenization0
Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems0
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka0
A Framework for Synthetic Audio Conversations Generation using Large Language Models0
A multilingual training strategy for low resource Text to Speech0
AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge0
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection0
Multi-modal Adversarial Training for Zero-Shot Voice Cloning0
Easy, Interpretable, Effective: openSMILE for voice deepfake detection0
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-SpeechCode0
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance0
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models0
Positional Description for Numerical Normalization0
Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting0
kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech0
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech RecognitionCode0
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation0
SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech SynthesisCode0
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks0
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing0
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation0
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition0
Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks0
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures0
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model0
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments0
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech20
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models0
Handling Numeric Expressions in Automatic Speech Recognition0
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural NetworkCode0
A Language Modeling Approach to Diacritic-Free Hebrew TTS0
Learning High-Frequency Functions Made Easy with Sinusoidal Positional EncodingCode0
Autoregressive Speech Synthesis without Vector Quantization0
Source Tracing of Audio Deepfake Systems0
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech GenerationCode0
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation0
On the Effectiveness of Acoustic BPE in Decoder-Only TTS0
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis0
Optimizing a-DCF for Spoofing-Robust Speaker Verification0
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization0
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations0
Lightweight Zero-shot Text-to-Speech with Mixture of Adapters0
Show:102550
← PrevPage 11 of 29Next →

No leaderboard results yet.