A Toolbox for Construction and Analysis of Speech Datasets

2021-04-11Code Available1· sign in to hype

Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

Code Available — Be the first to reproduce this paper.

Code

github.com/lumaku/ctc-segmentation
pytorch★ 346

Abstract

Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on K\"urzinger et al. work, and, to the best of our knowledge, the dataset exploration tool is the world's first open-source tool of this kind. We demonstrate how to apply these tools to create a Russian speech dataset and analyze existing speech datasets (Multilingual LibriSpeech, Mozilla Common Voice). The tools are open sourced as a part of the NeMo framework.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)speech-recognition Speech Recognition text-to-speech Text to Speech

A Toolbox for Construction and Analysis of Speech Datasets

Code

Abstract

Tasks

Reproductions