SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 851900 of 1419 papers

TitleStatusHype
Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch0
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance0
Fine-grained Noise Control for Multispeaker Speech Synthesis0
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech0
Karaoker: Alignment-free singing voice synthesis with speech training data0
Arabic Text-To-Speech (TTS) Data Preparation0
Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis0
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis0
Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification0
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation0
Deliberation Model for On-Device Spoken Language Understanding0
Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck0
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature0
Text-To-Speech Data Augmentation for Low Resource Speech Recognition0
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios0
An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice TransformerCode1
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset0
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis0
A Character-level Span-based Model for Mandarin Prosodic Structure PredictionCode1
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to SpeechCode1
Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition0
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech0
End to End Lip Synchronization with a Temporal AutoEncoderCode1
Does Audio Deepfake Detection Generalize?0
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech RecognitionCode1
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise DistillationCode2
Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis0
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus0
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent0
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge0
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis0
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling0
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise0
ECAPA-TDNN for Multi-speaker Text-to-speech SynthesisCode0
Improve few-shot voice cloning using multi-modal learning0
Text-free non-parallel many-to-many voice conversion using normalising flows0
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features0
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier TransformCode2
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline FlowsCode2
Revisiting Over-Smoothness in Text to Speech0
Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video0
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme0
r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation0
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech0
Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module0
Unsupervised word-level prosody tagging for controllable speech synthesis0
NewsPod: Automatic and Interactive News Podcasts0
Distribution augmentation for low-resource expressive text-to-speech0
Deep Performer: Score-to-Audio Music Performance Synthesis0
Cross-speaker style transfer for text-to-speech using data augmentation0
Show:102550
← PrevPage 18 of 29Next →

No leaderboard results yet.