Keyword Spotting

In speech processing, keyword spotting deals with the identification of keywords in utterances.

( Image credit: Simon Grest )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 407 papers

Title	Date	Tasks	Status	Hype
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit	May 20, 2022	AllAutomatic Speech Recognition (ASR)	CodeCode Available	6
GLAP: General contrastive audio-text pretraining across domains and languages	Jun 12, 2025	AudioCapsKeyword Spotting	CodeCode Available	2
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding	May 26, 2025	Keyword Spotting	CodeCode Available	2
Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency	Dec 17, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	2
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model	May 20, 2024	Audio ClassificationGPU	CodeCode Available	2
TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer	Mar 20, 2024	Keyword Spotting	CodeCode Available	2
WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit	Oct 30, 2022	Keyword Spotting	CodeCode Available	2
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection	Feb 2, 2022	Audio ClassificationEvent Detection	CodeCode Available	2
SSAST: Self-Supervised Audio Spectrogram Transformer	Oct 19, 2021	Audio ClassificationClassification	CodeCode Available	2
AST: Audio Spectrogram Transformer	Apr 5, 2021	Audio ClassificationAudio Tagging	CodeCode Available	2
Training Keyword Spotters with Limited and Synthesized Speech Data	Jan 31, 2020	Keyword Spotting	CodeCode Available	2
Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data	May 30, 2025	Continual LearningFew-Shot Learning	CodeCode Available	1
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages	Jan 14, 2025	Abusive LanguageKeyword Spotting	CodeCode Available	1
Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors	Aug 22, 2024	Keyword SpottingSelf-Learning	CodeCode Available	1
Text-aware Speech Separation for Multi-talker Keyword Spotting	Jun 18, 2024	Keyword SpottingSpeech Separation	CodeCode Available	1
ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting	Jun 14, 2024	Edge-computingKeyword Spotting	CodeCode Available	1
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting	Jun 11, 2024	Data AugmentationKeyword Spotting	CodeCode Available	1
Sparse Binarization for Fast Keyword Spotting	Jun 9, 2024	BinarizationKeyword Spotting	CodeCode Available	1
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language	Nov 14, 2023	Keyword Spotting	CodeCode Available	1
Towards on-Device Keyword Spotting using Low-Footprint Quaternion Neural Models	Sep 15, 2023	Keyword Spotting	CodeCode Available	1
PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords	Aug 31, 2023	Keyword Spotting	CodeCode Available	1
Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems	Jun 3, 2023	Few-Shot LearningKeyword Spotting	CodeCode Available	1
Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers	May 30, 2023	Continual Learningimage-classification	CodeCode Available	1
LipLearner: Customizable Silent Speech Interactions on Mobile Devices	Feb 12, 2023	Contrastive LearningIncremental Learning	CodeCode Available	1
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification	Nov 23, 2022	Keyword SpottingSelf-Supervised Learning	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 17Next →

All datasets QUESST Google Speech Commands hey Siri FKD Google Speech Commands V2 35 TensorFlow VoxForge Google Speech Commands (v2)Google Speech Commands V2 12

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NNI non-filtered(for the development set)	Cnxe	6.09	—	Unverified
2	NNI Choi(for the development set)	Cnxe	5.89	—	Unverified
3	NTU rnn (eval)	Cnxe	2.01	—	Unverified
4	NTU dtw (eval)	Cnxe	2.01	—	Unverified
5	NTU dtw (dev)	Cnxe	2.01	—	Unverified
6	NTU rnn (dev)	Cnxe	2.01	—	Unverified
7	ELiRF SDTW (eval)	Cnxe	1.19	—	Unverified
8	ELiRF SDTW-avg (eval)	Cnxe	1.07	—	Unverified
9	ELiRF SDTW (dev)	Cnxe	1.07	—	Unverified
10	CUNY [Subseq+MFCC] (eval)	Cnxe	1.07	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveFormer	Google Speech Commands V2 12	98.8	—	Unverified
2	QNN	Google Speech Commands V2 35	98.6	—	Unverified
3	TripletLoss-res15	Google Speech Commands V1 12	98.56	—	Unverified
4	M2D	Google Speech Commands V2 35	98.5	—	Unverified
5	EAT-S	Google Speech Commands V2 35	98.15	—	Unverified
6	Audio Spectrogram Transformer	Google Speech Commands V2 35	98.11	—	Unverified
7	EdgeCRNN 2.0×	Google Speech Commands V2 12	98.05	—	Unverified
8	BC-ResNet-8	Google Speech Commands V1 12	98	—	Unverified
9	HTS-AT	Google Speech Commands V2 35	98	—	Unverified
10	Wav2KWS	Google Speech Commands V1 12	97.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Stacked 1D CNN	Error Rate	1.99	—	Unverified
2	End-to-end DNN-HMM	Error Rate	1.7	—	Unverified
3	HEiMDaL	Error Rate	0.45	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Res26	Accuracy	95.88	—	Unverified
2	EfficientNet-A0 + SA + TL	Accuracy	95.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	QuaternionNeuralNetwork	Accuracy (10-fold)	98.53	—	Unverified
2	SSAMBA	Accuracy (10-fold)	97.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TensorFlow's model version 2	TFMA	89.7	—	Unverified
2	TensorFlow's model version 1	TFMA	85.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	2D-ConvNet	Accuracy (%)	95.4	—	Unverified
2	1D-ConvNet	Accuracy (%)	93.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Quaternion Neural Networks	Accuracy(10-fold)	98.53	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MicroNet-KWS-L	Accuracy	95.3	—	Unverified