SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 101150 of 1419 papers

TitleStatusHype
PresentAgent: Multimodal Agent for Presentation Video GenerationCode2
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise DistillationCode2
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality AlignmentCode2
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing SynthesizersCode2
RWKVTTS: Yet another TTS based on RWKV-7Code2
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsCode2
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-SpeechCode1
g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark DatasetCode1
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text RepresentationsCode1
MathReader : Text-to-Speech for Mathematical DocumentsCode1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech RecognitionCode1
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech DetectionCode1
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech GenerationCode1
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech SynthesisCode1
Fine-grained style control in Transformer-based Text-to-speech SynthesisCode1
Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data AugmentationCode1
FastPitchFormant: Source-filter based Decomposed Modeling for Speech SynthesisCode1
FastPitch: Parallel Text-to-speech with Pitch PredictionCode1
FastSpeech 2: Fast and High-Quality End-to-End Text to SpeechCode1
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback ConstraintCode1
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic FeaturesCode1
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech SynthesisCode1
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech TranslationCode1
Mitigating Unauthorized Speech Synthesis for Voice ProtectionCode1
Evaluating Speech Synthesis by Training Recognizers on Synthetic SpeechCode1
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech UnderstandingCode1
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnetCode1
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture SearchCode1
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence TrainingCode1
End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition ModelCode1
End to End Lip Synchronization with a Temporal AutoEncoderCode1
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style ConversionCode1
A Character-level Span-based Model for Mandarin Prosodic Structure PredictionCode1
End-to-End Adversarial Text-to-SpeechCode1
Emotion-Aware Prosodic Phrasing for Expressive Text-to-SpeechCode1
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text PretrainingCode1
Learning to Dub Movies via Hierarchical Prosody ModelsCode1
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution ShiftsCode1
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment SearchCode1
EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novelsCode1
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep LearningCode1
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to SpeechCode1
Learning Arousal-Valence Representation from Categorical Emotion Labels of SpeechCode1
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationCode1
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddingsCode1
Effective Deep Learning Models for Automatic Diacritization of Arabic TextCode1
EdiTTS: Score-based Editing for Controllable Text-to-SpeechCode1
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional FusionCode1
Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent DictionariesCode1
Show:102550
← PrevPage 3 of 29Next →

No leaderboard results yet.