SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 301350 of 1419 papers

TitleStatusHype
Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting0
kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech0
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech RecognitionCode0
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationCode3
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation0
SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech SynthesisCode0
PRESENT: Zero-Shot Text-to-Prosody ControlCode1
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks0
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing0
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic FeaturesCode1
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation0
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition0
Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks0
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures0
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model0
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments0
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech20
Handling Numeric Expressions in Automatic Speech Recognition0
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models0
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural NetworkCode0
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-SpeechCode1
TTSDS -- Text-to-Speech Distribution ScoreCode2
A Language Modeling Approach to Diacritic-Free Hebrew TTS0
Learning High-Frequency Functions Made Easy with Sinusoidal Positional EncodingCode0
Autoregressive Speech Synthesis without Vector Quantization0
Source Tracing of Audio Deepfake Systems0
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation0
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech GenerationCode0
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
Optimizing a-DCF for Spoofing-Robust Speaker Verification0
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis0
On the Effectiveness of Acoustic BPE in Decoder-Only TTS0
CATT: Character-based Arabic Tashkeel TransformerCode2
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations0
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization0
Lightweight Zero-shot Text-to-Speech with Mixture of Adapters0
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis0
NAIST Simultaneous Speech Translation System for IWSLT 20240
Open-Source Conversational AI with SpeechBrain 1.00
Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models0
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time VariabilityCode2
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTSCode1
Automatic Speech Recognition for Hindi0
LLM-Driven Multimodal Opinion Expression Identification0
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model0
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation0
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment0
Towards Zero-Shot Text-To-Speech for Arabic Dialects0
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge0
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech SynthesizersCode1
Show:102550
← PrevPage 7 of 29Next →

No leaderboard results yet.