Sound Event Localization and Detection

Given multichannel audio input, a sound event detection and localization (SELD) system outputs a temporal activation track for each of the target sound classes, along with one or more corresponding spatial trajectories when the track indicates activity. This results in a spatio-temporal characterization of the acoustic scene that can be used in a wide range of machine cognition tasks, such as inference on the type of environment, self-localization, navigation without visual input or with occluded targets, tracking of specific types of sound sources, smart-home applications, scene visualization systems, and audio surveillance, among others.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 65 papers

Title	Date	Tasks	Status
CST-former: Multidimensional Attention-based Transformer for Sound Event Localization and Detection in Real Scenes	Apr 17, 2025	Event DetectionSound Event Localization and Detection	—Unverified
CST-former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection	Dec 20, 2023	Sound Event Localization and Detection	—Unverified
Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes	Jun 24, 2022	Data AugmentationSound Event Localization and Detection	—Unverified
Divided spectro-temporal attention for sound event localization and detection in real scenes for DCASE2023 challenge	Jun 5, 2023	Event DetectionSound Event Detection	—Unverified
DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection	Oct 30, 2024	Contrastive LearningSelf-Supervised Learning	—Unverified
Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection	Jul 17, 2023	Data AugmentationSound Event Localization and Detection	—Unverified
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection	Jun 21, 2021	Data AugmentationDiversity	—Unverified
Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios	Jun 21, 2024	Data AugmentationSound Event Localization and Detection	—Unverified
Feature Aggregation in Joint Sound Classification and Localization Neural Networks	Oct 29, 2023	regressionSound Classification	—Unverified
Learning Spatially-Aware Language and Audio Embeddings	Sep 17, 2024	AttributeContrastive Learning	—Unverified

Show:10 25 50

← PrevPage 4 of 7Next →

All datasets PodcastFillers STARSS22 L3DAS21 RWCP Sound Scene Database TAU-NIGENS Spatial Sound Events 2021

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AVC-FillerNet	event-based F1 score	92.8	—	Unverified
2	VC-FillerNet	event-based F1 score	71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline (MIC)	Class-dependent localization error	32.2	—	Unverified
2	Baseline (FOA)	Class-dependent localization error	29.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DualQSELD-TCN (parallel)	SELD score	0.32	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	STL-SNN	accuracy	98.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SALSA-FOA	ER≤20°	0.38	—	Unverified