A Corpus of Neutral Voice Speech in Brazilian Portuguese

2021-05-21International Conference on Computational Processing of the Portuguese Language 2021Unverified0· sign in to hype

Pedro H. L. Leite, Edmundo Hoyle, Álvaro Antelo, Luiz F. Kruszielski, Luiz W. P. Biscainho

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This work presents a new database containing high sampling rate recordings of a single male speaker reading sentences in Brazilian Portuguese with neutral voice, along with the corresponding text corpus. Intended for synthesis and other speech-oriented applications, the dataset contains text scripts extracted from a popular Brazilian news TV program, read out loud by a trained individual in a controlled environment, resulting in roughly 20 h of audio data. The text was normalized in the recording process and special textual occurrences (e.g. acronyms, numbers, foreign names etc.) were replaced by their phonetic translation to a readable text in Portuguese. There are no noticeable accidental sounds and background noise has been kept to a minimum in all audio samples. To illustrate the potential benefits of having this data available, text-to-speech experiments were conducted using state-of-the-art models for speech synthesis (Tacotron 2 and Waveglow). As a result, we obtained intelligible and natural sounding voices from as few as 8 min of audio samples coming from an unseen target speaker, after having trained over our data; moreover, by increasing the target recording time to 75 min, we have noticeably improved accuracy in pronunciation.

Tasks

Speech Synthesis text-to-speech Text to Speech

A Corpus of Neutral Voice Speech in Brazilian Portuguese

Abstract

Tasks

Reproductions