SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 151200 of 1419 papers

TitleStatusHype
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis DatasetCode1
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text PretrainingCode1
Improving TTS for Shanghainese: Addressing Tone Sandhi via Word SegmentationCode1
Accented Text-to-Speech Synthesis with a Conditional Variational AutoencoderCode1
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer LearningCode1
HyperTTS: Parameter Efficient Adaptation in Text to Speech using HypernetworksCode1
An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice TransformerCode1
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methodsCode1
IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine TranslationCode1
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution ShiftsCode1
Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An OverviewCode1
Grad-TTS: A Diffusion Probabilistic Model for Text-to-SpeechCode1
An Efficient Membership Inference Attack for the Diffusion Model by Proximal InitializationCode1
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech DetectionCode1
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback ConstraintCode1
g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark DatasetCode1
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech InstructionsCode1
AudioMarkBench: Benchmarking Robustness of Audio WatermarkingCode1
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech SynthesisCode1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech RecognitionCode1
AdaSpeech 2: Adaptive Text to Speech with Untranscribed DataCode1
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language ModelsCode1
Fine-grained style control in Transformer-based Text-to-speech SynthesisCode1
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment SearchCode1
FastPitch: Parallel Text-to-speech with Pitch PredictionCode1
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice GenerationCode1
AdaSpeech: Adaptive Text to Speech for Custom VoiceCode1
HUI-Audio-Corpus-German: A high quality TTS datasetCode1
FastSpeech 2: Fast and High-Quality End-to-End Text to SpeechCode1
Imaginary Voice: Face-styled Diffusion Model for Text-to-SpeechCode1
Automatic Prosody Annotation with Pre-Trained Text-Speech ModelCode1
Improving fairness for spoken language understanding in atypical speech with Text-to-SpeechCode1
Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent DictionariesCode1
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited DataCode1
ÌròyìnSpeech: A multi-purpose Yorùbá Speech CorpusCode1
Bidirectional Variational Inference for Non-Autoregressive Text-to-SpeechCode1
FastPitchFormant: Source-filter based Decomposed Modeling for Speech SynthesisCode1
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech SynthesisCode1
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style ConversionCode1
Learning Arousal-Valence Representation from Categorical Emotion Labels of SpeechCode1
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length EmbeddingCode1
Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language TextCode1
End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition ModelCode1
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnetCode1
Emotion-Aware Prosodic Phrasing for Expressive Text-to-SpeechCode1
Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data AugmentationCode1
BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithmCode1
End-to-End Adversarial Text-to-SpeechCode1
Attention model for articulatory features detectionCode1
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS AdaptationCode1
Show:102550
← PrevPage 4 of 29Next →

No leaderboard results yet.