SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 101150 of 1419 papers

TitleStatusHype
DiffSinger: Singing Voice Synthesis via Shallow Diffusion MechanismCode2
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogramCode2
FastSpeech: Fast,Robustand Controllable Text-to-SpeechCode2
FastSpeech: Fast, Robust and Controllable Text to SpeechCode2
LPCNet: Improving Neural Speech Synthesis Through Linear PredictionCode2
Neural Speech Synthesis with Transformer NetworkCode2
Efficient Neural Audio SynthesisCode2
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech SystemsCode1
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech InstructionsCode1
UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic informationCode1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech RecognitionCode1
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language ModelsCode1
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution ShiftsCode1
Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and MaliseetCode1
MathReader : Text-to-Speech for Mathematical DocumentsCode1
Mitigating Unauthorized Speech Synthesis for Voice ProtectionCode1
STTATTS: Unified Speech-To-Text And Text-To-Speech ModelCode1
Where are we in audio deepfake detection? A systematic analysis over generative and detection modelsCode1
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
PRESENT: Zero-Shot Text-to-Prosody ControlCode1
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic FeaturesCode1
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-SpeechCode1
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTSCode1
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech SynthesizersCode1
AudioMarkBench: Benchmarking Robustness of Audio WatermarkingCode1
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelCode1
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal PromptsCode1
USAT: A Universal Speaker-Adaptive Text-to-Speech ApproachCode1
HyperTTS: Parameter Efficient Adaptation in Text to Speech using HypernetworksCode1
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech SynthesisCode1
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Multi-Task Learning for Front-End Text Processing in TTSCode1
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realismCode1
Learning Arousal-Valence Representation from Categorical Emotion Labels of SpeechCode1
Improving fairness for spoken language understanding in atypical speech with Text-to-SpeechCode1
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer LearningCode1
ArTST: Arabic Text and Speech TransformerCode1
Crowdsourced and Automatic Speech Prominence EstimationCode1
Evaluating Speech Synthesis by Training Recognizers on Synthetic SpeechCode1
BiSinger: Bilingual Singing Voice SynthesisCode1
Emotion-Aware Prosodic Phrasing for Expressive Text-to-SpeechCode1
Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language ModelCode1
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methodsCode1
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWPCode1
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation LearningCode1
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech ModelsCode1
Towards an AI to Win Ghana's National Science and Maths QuizCode1
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech TranslationCode1
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingCode1
Show:102550
← PrevPage 3 of 29Next →

No leaderboard results yet.