SOTAVerified

Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech

2020-05-01LREC 2020Unverified0· sign in to hype

Atli Sigurgeirsson, Gunnar {\"O}rn{\'o}lfsson, J{\'o}n Gu{\dh}nason

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Atli \'or Sigurgeirsson, atlithors@ru.is, Reykjavik University Gunnar Thor \"Orn\'olfsson, gunnarthor@hi.is, \'Arni Magn\'usson institute of Icelandic studies Dr. J\'on Gu nason, jg@ru.is In this paper we present the work of collecting a large amount of high quality speech synthesis data for Icelandic. 8 speakers will be recorded for 20 hours each. A script design strategy is proposed and three scripts have been generated to maximize diphone coverage, varying in length. The largest reading script contains 14,400 prompts and includes 87.3\% of all Icelandic diphones at least once and 81\% of all Icelandic diphones at least twenty times. A recording client was developed to facilitate recording sessions. The client supports easily importing scripts and maintaining multiple collections in parallel. The recorded data can be downloaded straight from the client. Recording sessions are carried out in a professional studio under supervision and started October of 2019. As of writing, 58.7 hours of high quality speech data has been collected. The scripts, the recording software and the speech data will later be released under a CC-BY 4.0 license.

Tasks

Reproductions