Keyword Transformer: A Self-Attention Model for Keyword Spotting

2021-04-01Code Available1· sign in to hype

Axel Berg, Mark O'Connor, Miguel Tairum Cruz

Code Available — Be the first to reproduce this paper.

Code

github.com/ARM-software/keyword-transformer
OfficialIn papertf★ 139
github.com/mashrurmorshed/torch-kwt
pytorch★ 40
github.com/ID56/Torch-KWT
pytorch★ 40
github.com/holgerbovbjerg/data2vec-kws
pytorch★ 31
github.com/KrishnaDN/Keyword-Transformer
none★ 23
github.com/intelligentmachines/keyword_spotting_transformer
tf★ 9
github.com/aau-es-ml/ssl_noise-robust_kws
pytorch★ 9
github.com/Arizona-Voice/Arizona-spotting
pytorch★ 3
github.com/phanxuanphucnd/Arizona-spotting
none★ 2
github.com/EscVM/EscVM_YT/blob/master/Notebooks/1%20-%20TF2.X%20DeepAI-Quickie/tf_2_keyword_transformer.ipynb
tf★ 0

Abstract

The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.

Tasks

Keyword Spotting Speech Recognition

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Code

Abstract

Tasks

Reproductions