Scribosermo: Fast Speech-to-Text models for German and other Languages
Daniel Bermuth, Alexander Poeppel, Wolfgang Reif
Code Available — Be the first to reproduce this paper.
ReproduceCode
- gitlab.com/jaco-assistant/corcuaOfficialIn papernone★ 0
- gitlab.com/jaco-assistant/scribosermoOfficialIn papertf★ 0
- gitlab.com/Jaco-Assistant/deepspeech-polyglottf★ 0
Abstract
Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Common Voice French | ConformerCTC-L (5-gram) | Test WER | 8.13 | — | Unverified |
| Common Voice French | ConformerCTC-L (no-LM) | Test WER | 9.63 | — | Unverified |
| Common Voice French | QuartzNet15x5FR (CV-only) | Test WER | 12.1 | — | Unverified |
| Common Voice French | QuartzNet15x5FR (D7) | Test WER | 11 | — | Unverified |
| Common Voice French | ConformerCTC-L (no-LM) | Test WER | 10.19 | — | Unverified |
| Common Voice German | QuartzNet15x5DE (CV-only, 5-gram) | Test WER | 7.7 | — | Unverified |
| Common Voice German | ConformerCTC-L (no LM) | Test WER | 6.68 | — | Unverified |
| Common Voice German | ConformerCTC-L (5-gram) | Test WER | 4.05 | — | Unverified |
| Common Voice German | QuartzNet15x5DE (D37, 5-gram) | Test WER | 6.6 | — | Unverified |
| Common Voice German | ConformerCTC-L (no LM) | Test WER | 7.33 | — | Unverified |
| Common Voice Italian | QuartzNet15x5IT (D5) | Test WER | 11.5 | — | Unverified |
| Common Voice Spanish | QuartzNet15x5ES (CV-only) | Test WER | 10.5 | — | Unverified |
| Common Voice Spanish | QuartzNet15x5ES (D8) | Test WER | 10 | — | Unverified |
| Common Voice Spanish | ConformerCTC-L (no-LM) | Test WER | 7.46 | — | Unverified |
| Common Voice Spanish | ConformerCTC-L (5-gram) | Test WER | 5.68 | — | Unverified |
| TUDA | QuartzNet15x5DE (D37) | Test WER | 10.2 | — | Unverified |