Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
Daniel Bermuth, Alexander Poeppel, Wolfgang Reif
Code Available — Be the first to reproduce this paper.
ReproduceCode
- gitlab.com/Jaco-Assistant/finstrederOfficialnone★ 0
- gitlab.com/Jaco-Assistant/Jaco-Masternone★ 0
Abstract
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| SLURP | Finstreder (Conformer) | Accuracy (%) | 53.11 | — | Unverified |
| SLURP | Finstreder (Quartznet) | Accuracy (%) | 43.15 | — | Unverified |