SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 251300 of 1419 papers

TitleStatusHype
End to End Lip Synchronization with a Temporal AutoEncoderCode1
Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph EntitiesCode1
Where are we in audio deepfake detection? A systematic analysis over generative and detection modelsCode1
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation LearningCode1
SpeechLMScore: Evaluating speech generation using speech language modelCode1
SC VALL-E: Style-Controllable Zero-Shot Text to Speech SynthesizerCode1
Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found DataCode1
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech ModelCode1
Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and MaliseetCode1
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-SpeechCode1
Deep Learning Based Assessment of Synthetic Speech NaturalnessCode1
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech SynthesisCode1
Crowdsourced and Automatic Speech Prominence EstimationCode1
ArTST: Arabic Text and Speech TransformerCode1
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-SpeechCode1
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-SpeechCode1
End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition ModelCode1
RyanSpeech: A Corpus for Conversational Text-to-Speech SynthesisCode1
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingCode1
Semi-Supervised Neural Architecture SearchCode1
Dreamento: an open-source dream engineering toolbox for sleep EEG wearablesCode1
QSpeech: Low-Qubit Quantum Speech Application ToolkitCode0
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-SpeechCode0
PromptTTS: Controllable Text-to-Speech with Text DescriptionsCode0
Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022Code0
Prosody Analysis of AudiobooksCode0
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake DatasetCode0
AraSpot: Arabic Spoken Command SpottingCode0
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesisCode0
Predicting distributions with Linearizing Belief NetworksCode0
A Fully Time-domain Neural Model for Subband-based Speech SynthesizerCode0
Preparing an Endangered Language for the Digital Age: The Case of Judeo-SpanishCode0
A Practical Guide to Logical Access Voice Presentation Attack DetectionCode0
On the Discrepancy between Density Estimation and Sequence GenerationCode0
Numbers Normalisation in the Inflected Languages: a Case Study of PolishCode0
Applying Phonological Features in Multilingual Text-To-SpeechCode0
ObamaNet: Photo-realistic lip-sync from textCode0
A Comparative Study on Transformer vs RNN in Speech ApplicationsCode0
Non-Autoregressive Neural Text-to-SpeechCode0
Naturalization of Text by the Insertion of Pauses and Filler WordsCode0
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-SpeechCode0
Neural Voice Puppetry: Audio-driven Facial ReenactmentCode0
RNN Approaches to Text Normalization: A ChallengeCode0
MLS: A Large-Scale Multilingual Dataset for Speech ResearchCode0
Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State TransducersCode0
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-SpeechCode0
Meta Learning Text-to-Speech Synthesis in over 7000 LanguagesCode0
Massively Multilingual Neural Grapheme-to-Phoneme ConversionCode0
MelNet: A Generative Model for Audio in the Frequency DomainCode0
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the BibleCode0
Show:102550
← PrevPage 6 of 29Next →

No leaderboard results yet.