SOTAVerified

Scribosermo: Fast Speech-to-Text models for German and other Languages

2021-10-15Code Available0· sign in to hype

Daniel Bermuth, Alexander Poeppel, Wolfgang Reif

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Common Voice FrenchConformerCTC-L (5-gram)Test WER8.13Unverified
Common Voice FrenchConformerCTC-L (no-LM)Test WER9.63Unverified
Common Voice FrenchQuartzNet15x5FR (CV-only)Test WER12.1Unverified
Common Voice FrenchQuartzNet15x5FR (D7)Test WER11Unverified
Common Voice FrenchConformerCTC-L (no-LM)Test WER10.19Unverified
Common Voice GermanQuartzNet15x5DE (CV-only, 5-gram)Test WER7.7Unverified
Common Voice GermanConformerCTC-L (no LM)Test WER6.68Unverified
Common Voice GermanConformerCTC-L (5-gram)Test WER4.05Unverified
Common Voice GermanQuartzNet15x5DE (D37, 5-gram)Test WER6.6Unverified
Common Voice GermanConformerCTC-L (no LM)Test WER7.33Unverified
Common Voice ItalianQuartzNet15x5IT (D5)Test WER11.5Unverified
Common Voice SpanishQuartzNet15x5ES (CV-only)Test WER10.5Unverified
Common Voice SpanishQuartzNet15x5ES (D8)Test WER10Unverified
Common Voice SpanishConformerCTC-L (no-LM)Test WER7.46Unverified
Common Voice SpanishConformerCTC-L (5-gram)Test WER5.68Unverified
TUDAQuartzNet15x5DE (D37)Test WER10.2Unverified

Reproductions