SOTAVerified

RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition

2020-05-01LREC 2020Unverified0· sign in to hype

Alex Georgescu, ru-Lucian, Horia Cucu, Andi Buzo, Corneliu Burileanu

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Although many efforts have been made in the last decade to enhance the speech and language resources for Romanian, this language is still considered under-resourced. While for many other languages there are large speech corpora available for research and commercial applications, for Romanian language the largest publicly available corpus to date comprises less than 50 hours of speech. In this context, Speech and Dialogue research group releases Read Speech Corpus (RSC) -- a Romanian speech corpus developed in-house, comprising 100 hours of speech recordings from 164 different speakers. The paper describes the development of the corpus and presents baseline automatic speech recognition (ASR) results using state-of-the-art ASR technology: Kaldi speech recognition toolkit.

Tasks

Reproductions