SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 9511000 of 1419 papers

TitleStatusHype
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation0
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis0
Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck0
Deliberation Model for On-Device Spoken Language Understanding0
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature0
Text-To-Speech Data Augmentation for Low Resource Speech Recognition0
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios0
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis0
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset0
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech0
Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition0
Does Audio Deepfake Detection Generalize?0
Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis0
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus0
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent0
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge0
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis0
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling0
ECAPA-TDNN for Multi-speaker Text-to-speech SynthesisCode0
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise0
Improve few-shot voice cloning using multi-modal learning0
Text-free non-parallel many-to-many voice conversion using normalising flows0
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features0
Revisiting Over-Smoothness in Text to Speech0
Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video0
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme0
r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation0
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech0
Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module0
Unsupervised word-level prosody tagging for controllable speech synthesis0
NewsPod: Automatic and Interactive News Podcasts0
Distribution augmentation for low-resource expressive text-to-speech0
Deep Performer: Score-to-Audio Music Performance Synthesis0
Cross-speaker style transfer for text-to-speech using data augmentation0
Building Synthetic Speaker Profiles in Text-to-Speech Systems0
Multi-Stage Deep Transfer Learning for EmIoT-enabled Human-Computer Interaction0
Transformer-based Models of Text Normalization for Speech Applications0
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs0
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition0
The MSXF TTS System for ICASSP 2022 ADD Challenge0
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention0
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end0
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training0
Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems0
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics0
A Practical Guide to Logical Access Voice Presentation Attack Detection0
A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architectureCode0
SoK: A Study of the Security on Voice Processing Systems0
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios0
Multi-speaker Emotional Text-to-speech Synthesizer0
Show:102550
← PrevPage 20 of 29Next →

No leaderboard results yet.