SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 51100 of 1419 papers

TitleStatusHype
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-InstrumentCode2
RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented TransformerCode2
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical VectorCode2
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
Audio Deepfake Detection with Self-Supervised XLS-R and SLS ClassifierCode2
Recent Advances in Speech Language Models: A SurveyCode2
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion ControlCode2
SafeEar: Content Privacy-Preserving Audio Deepfake DetectionCode2
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and SynthesisCode2
IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSCode2
Sample-Efficient Diffusion for Text-To-Speech SynthesisCode2
TTSDS -- Text-to-Speech Distribution ScoreCode2
CATT: Character-based Arabic Tashkeel TransformerCode2
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time VariabilityCode2
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific FactorsCode2
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style CaptioningCode2
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-SpeechCode2
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model BenchmarkCode2
Small-E: Small Language Model with Linear Attention for Efficient Speech SynthesisCode2
TransVIP: Speech to Speech Translation System with Voice and Isochrony PreservationCode2
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker ConversationsCode2
Llama-VITS: Enhancing TTS Synthesis with Semantic AwarenessCode2
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency ModelsCode2
An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset GenerationCode2
Paralinguistics-Aware Speech-Empowered Large Language Models for Natural ConversationCode2
PAM: Prompting Audio-Language Models for Audio Quality AssessmentCode2
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text AlignmentCode2
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody ModellingCode2
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech CodecCode2
VoiceFlow: Efficient Text-to-Speech with Rectified Flow MatchingCode2
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsCode2
SeamlessM4T: Massively Multilingual & Multimodal Machine TranslationCode2
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture DesignCode2
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency ModelCode2
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech SynthesisCode2
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing SynthesizersCode2
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTSCode2
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous SpeechCode2
Towards Building Text-To-Speech Systems for the Next Billion UsersCode2
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier TransformCode2
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-SpeechCode2
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech SynthesisCode2
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-SpeechCode2
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level QualityCode2
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech SynthesisCode2
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise DistillationCode2
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier TransformCode2
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline FlowsCode2
PortaSpeech: Portable and High-Quality Generative Text-to-SpeechCode2
Show:102550
← PrevPage 2 of 29Next →

No leaderboard results yet.