SOTAVerified

Samrómur Children: An Icelandic Speech Corpus

2022-06-01LREC 2022Unverified0· sign in to hype

Carlos Daniel Hernandez Mena, David Erik Mollberg, Michal Borský, Jón Guðnason

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Samrómur Children is an Icelandic speech corpus intended for the field of automatic speech recognition. It contains 131 hours of read speech from Icelandic children aged between 4 to 17 years. The test portion was meticulously selected to cover a wide range of ages as possible; we aimed to have exactly the same amount of data per age range. The speech was collected with the crowd-sourcing platform Samrómur.is, which is inspired on the “Mozilla’s Common Voice Project”. The corpus was developed within the framework of the “Language Technology Programme for Icelandic 2019 − 2023”; the goal of the project is to make Icelandic available in language-technology applications. Samrómur Children is the first corpus in Icelandic with children’s voices for public use under a Creative Commons license. Additionally, we present baseline experiments and results using Kaldi.

Tasks

Reproductions