SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 501550 of 1419 papers

TitleStatusHype
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody ModellingCode2
Crowdsourced and Automatic Speech Prominence EstimationCode1
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition0
Prosody Analysis of AudiobooksCode0
Neutral TTS Female Voice Corpus in Brazilian Portuguese0
Unified speech and gesture synthesis using flow matching0
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset0
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis0
The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains0
Towards human-like spoken dialogue generation between AI agents from written dialogue0
Evaluating Speech Synthesis by Training Recognizers on Synthetic SpeechCode1
Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features0
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS0
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models0
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping0
BiSinger: Bilingual Singing Voice SynthesisCode1
VoiceLDM: Text-to-Speech with Environmental Context0
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis0
Emotion-Aware Prosodic Phrasing for Expressive Text-to-SpeechCode1
The Impact of Silence on Speech Anti-Spoofing0
Speak While You Think: Streaming Speech Synthesis During Text Generation0
Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language ModelCode1
Exploring Speech Enhancement for Low-resource Speech Synthesis0
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition0
Augmenting text for spoken language understanding with Large Language Models0
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methodsCode1
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions0
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech0
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech CodecCode2
Direct Text to Speech Translation System using Acoustic Units0
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWPCode1
VoiceFlow: Efficient Text-to-Speech with Rectified Flow MatchingCode2
Cross-Utterance Conditioned VAE for Speech Generation0
Large-Scale Automatic Audiobook Creation0
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 20230
GRASS: Unified Generation Model for Speech-to-Semantic Tasks0
PromptTTS 2: Describing and Generating Voices with Text Prompt0
A Comparative Analysis of Pretrained Language Models for Text-to-Speech0
The FruitShell French synthesis system at the Blizzard 2023 Challenge0
Learning Speech Representation From Contrastive Token-Acoustic Pretraining0
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation LearningCode1
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsCode2
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis0
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information0
The DeepZen Speech Synthesis System for Blizzard Challenge 20230
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech0
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech ModelsCode1
Rep2wav: Noise Robust text-to-speech Using self-supervised representations0
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations0
Show:102550
← PrevPage 11 of 29Next →

No leaderboard results yet.