SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 13511400 of 1419 papers

TitleStatusHype
RNN Approaches to Text Normalization: A ChallengeCode0
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesisCode0
Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleCode0
A Comparative Study on Transformer vs RNN in Speech ApplicationsCode0
Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech SynthesisCode0
Non-Autoregressive Neural Text-to-SpeechCode0
ObamaNet: Photo-realistic lip-sync from textCode0
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit AlignmentCode0
Numbers Normalisation in the Inflected Languages: a Case Study of PolishCode0
ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworksCode0
BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio DatasetCode0
Neural Voice Puppetry: Audio-driven Facial ReenactmentCode0
Integrated Speech and Gesture SynthesisCode0
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head GenerationCode0
Independent and automatic evaluation of acoustic-to-articulatory inversion modelsCode0
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial NetworksCode0
Naturalization of Text by the Insertion of Pauses and Filler WordsCode0
Humane Speech Synthesis through Zero-Shot Emotion and Disfluency GenerationCode0
Deep Voice 2: Multi-Speaker Neural Text-to-SpeechCode0
SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech SynthesisCode0
Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-SpeechCode0
CSS10: A Collection of Single Speaker Speech Datasets for 10 LanguagesCode0
AraSpot: Arabic Spoken Command SpottingCode0
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-SpeechCode0
Multimodal Latent Language Modeling with Next-Token DiffusionCode0
Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-AlignmentCode0
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-SpeechCode0
Continuous Speech Tokenizer in Text To SpeechCode0
MLS: A Large-Scale Multilingual Dataset for Speech ResearchCode0
Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State TransducersCode0
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech SynthesisCode0
High Fidelity Speech Synthesis with Adversarial NetworksCode0
A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architectureCode0
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech GenerationCode0
Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue ImagingCode0
VIFS: An End-to-End Variational Inference for Foley Sound SynthesisCode0
Semantic Mask for Transformer based End-to-End Speech RecognitionCode0
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systemsCode0
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTSCode0
Hierarchical Generative Modeling for Controllable Speech SynthesisCode0
MelNet: A Generative Model for Audio in the Frequency DomainCode0
Using generative modelling to produce varied intonation for speech synthesisCode0
Applying Phonological Features in Multilingual Text-To-SpeechCode0
Massively Multilingual Neural Grapheme-to-Phoneme ConversionCode0
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the BibleCode0
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-SpeechCode0
Sequence Transduction with Recurrent Neural NetworksCode0
Audio Super Resolution using Neural NetworksCode0
Generating Synthetic Speech from SpokenVocab for Speech TranslationCode0
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech RecognitionCode0
Show:102550
← PrevPage 28 of 29Next →

No leaderboard results yet.