SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 701750 of 1419 papers

TitleStatusHype
ParlamentParla: A Speech Corpus of Catalan Parliamentary Sessions0
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations0
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling0
Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia0
Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech0
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis0
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice0
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes0
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis0
Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech0
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end0
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features0
Positional Description for Numerical Normalization0
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar0
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction0
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis0
Preference Alignment Improves Language Model-Based TTS0
Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling0
Probing Deep Speaker Embeddings for Speaker-related Tasks0
Probing Speaker-specific Features in Speaker Representations0
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control0
PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders0
PromptTTS 2: Describing and Generating Voices with Text Prompt0
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions0
Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions0
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis0
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech0
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech0
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis0
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features0
Prosody-TTS: An end-to-end speech synthesis system with prosody control0
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech0
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech0
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis0
Punjabi Text-To-Speech Synthesis System0
運用Python結合語音辨識及合成技術於自動化音文同步之實作(A Python Implementation of Automatic Speech-text Synchronization Using Speech Recognition and Text-to-Speech Technology)[In Chinese]0
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis0
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis0
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning0
RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations0
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis0
Reading Assistance through LARA, the Learning And Reading Assistant0
Real-Time Pill Identification for the Visually Impaired Using Deep Learning0
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence0
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis0
Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images0
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss0
Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech0
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability0
DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models0
Show:102550
← PrevPage 15 of 29Next →

No leaderboard results yet.