SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 101150 of 1419 papers

TitleStatusHype
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise DistillationCode2
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time VariabilityCode2
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level QualityCode2
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous SpeechCode2
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing SynthesizersCode2
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality AlignmentCode2
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow MatchingCode2
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech GenerationCode1
MathReader : Text-to-Speech for Mathematical DocumentsCode1
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text RepresentationsCode1
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence TrainingCode1
Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data AugmentationCode1
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text PretrainingCode1
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution ShiftsCode1
Learning Arousal-Valence Representation from Categorical Emotion Labels of SpeechCode1
Learning to Dub Movies via Hierarchical Prosody ModelsCode1
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis DatasetCode1
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to SpeechCode1
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech SynthesisCode1
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic FeaturesCode1
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language ModelsCode1
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech TranslationCode1
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-SpeechCode1
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture SearchCode1
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-SpeechCode1
Mitigating Unauthorized Speech Synthesis for Voice ProtectionCode1
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited DataCode1
Improving fairness for spoken language understanding in atypical speech with Text-to-SpeechCode1
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer LearningCode1
Improving TTS for Shanghainese: Addressing Tone Sandhi via Word SegmentationCode1
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech SystemsCode1
HyperTTS: Parameter Efficient Adaptation in Text to Speech using HypernetworksCode1
HUI-Audio-Corpus-German: A high quality TTS datasetCode1
IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine TranslationCode1
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice GenerationCode1
A Character-level Span-based Model for Mandarin Prosodic Structure PredictionCode1
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methodsCode1
Imaginary Voice: Face-styled Diffusion Model for Text-to-SpeechCode1
ÌròyìnSpeech: A multi-purpose Yorùbá Speech CorpusCode1
Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An OverviewCode1
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment SearchCode1
Grad-TTS: A Diffusion Probabilistic Model for Text-to-SpeechCode1
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech InstructionsCode1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech RecognitionCode1
g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark DatasetCode1
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech DetectionCode1
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep LearningCode1
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback ConstraintCode1
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddingsCode1
Show:102550
← PrevPage 3 of 29Next →

No leaderboard results yet.