SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 351400 of 1419 papers

TitleStatusHype
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions0
A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation0
Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System0
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs0
Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling0
A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples0
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset0
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition0
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model0
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis0
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS20
LAraBench: Benchmarking Arabic AI with Large Language Models0
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis0
A Challenge Set and Methods for Noun-Verb Ambiguity0
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS0
Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters0
DNN-based Speech Synthesis for Indian Languages from ASCII text0
Efficient data selection employing Semantic Similarity-based Graph Structures for model training0
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data0
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS0
A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data0
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens0
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM0
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS0
An In-depth Analysis of the Effect of Text Normalization in Social Media0
Discovering the Italian literature: interactive access to audio indexed text resources0
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation0
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT0
Direct Text to Speech Translation System using Acoustic Units0
An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer0
Voice Impression Control in Zero-Shot TTS0
Efficient Incremental Text-to-Speech on GPUs0
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers0
Direct Speech to Speech Translation: A Review0
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS0
DiscreTalk: Text-to-Speech as a Machine Translation Problem0
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech0
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing0
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization0
DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction0
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage0
Distribution augmentation for low-resource expressive text-to-speech0
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI0
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis0
DiffVoice: Text-to-Speech with Latent Diffusion0
Does Audio Deepfake Detection Generalize?0
Do Prosody Transfer Models Transfer Prosody?0
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech0
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes0
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis0
Show:102550
← PrevPage 8 of 29Next →

No leaderboard results yet.