LoRP-TTS: Low-Rank Personalized Text-To-Speech

2025-02-11Unverified0· sign in to hype

Łukasz Bondaruk, Jakub Kubiak

Unverified — Be the first to reproduce this paper.

Abstract

Speech synthesis models convert written text into natural-sounding audio. While earlier models were limited to a single speaker, recent advancements have led to the development of zero-shot systems that generate realistic speech from a wide range of speakers using their voices as additional prompts. However, they still struggle with imitating non-studio-quality samples that differ significantly from the training datasets. In this work, we demonstrate that utilizing Low-Rank Adaptation (LoRA) allows us to successfully use even single recordings of spontaneous speech in noisy environments as prompts. This approach enhances speaker similarity by up to 30pp while preserving content and naturalness. It represents a significant step toward creating truly diverse speech corpora, that is crucial in all speech-related tasks.

Tasks

Speech Synthesis text-to-speech Text to Speech

LoRP-TTS: Low-Rank Personalized Text-To-Speech

Abstract

Tasks

Reproductions