SOTAVerified

Sound Event Localization and Detection

Given multichannel audio input, a sound event detection and localization (SELD) system outputs a temporal activation track for each of the target sound classes, along with one or more corresponding spatial trajectories when the track indicates activity. This results in a spatio-temporal characterization of the acoustic scene that can be used in a wide range of machine cognition tasks, such as inference on the type of environment, self-localization, navigation without visual input or with occluded targets, tracking of specific types of sound sources, smart-home applications, scene visualization systems, and audio surveillance, among others.

Papers

Showing 2650 of 65 papers

TitleStatusHype
Fusion of Audio and Visual Embeddings for Sound Event Localization and DetectionCode1
w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-TrainingCode1
Feature Aggregation in Joint Sound Classification and Localization Neural Networks0
SwG-former: A Sliding-Window Graph Convolutional Network for Simultaneous Spatial-Temporal Information Extraction in Sound Event Localization and Detection0
Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and LocalizationCode0
META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection0
Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection0
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound EventsCode1
Divided spectro-temporal attention for sound event localization and detection in real scenes for DCASE2023 challenge0
Perception Test: A Diagnostic Benchmark for Multimodal Video ModelsCode2
AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and DetectionCode1
CoLoC: Conditioned Localizer and Classifier for Sound Event Localization and Detection0
Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation ChainsCode1
Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes0
A Synapse-Threshold Synergistic Learning Approach for Spiking Neural NetworksCode0
STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound eventsCode1
Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic RepresentationCode0
Filler Word Detection and Classification: A Dataset and BenchmarkCode0
Locate This, Not That: Class-Conditioned Sound Event DOA Estimation0
L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office EnvironmentCode1
Echo-aware Adaptation of Sound Event Localization and Detection in Unknown EnvironmentsCode0
Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around headCode0
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone ArraysCode1
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant TrainingCode1
Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detectionCode0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AVC-FillerNetevent-based F1 score92.8Unverified
2VC-FillerNetevent-based F1 score71Unverified
#ModelMetricClaimedVerifiedStatus
1Baseline (MIC)Class-dependent localization error32.2Unverified
2Baseline (FOA)Class-dependent localization error29.3Unverified
#ModelMetricClaimedVerifiedStatus
1DualQSELD-TCN (parallel)SELD score0.32Unverified
#ModelMetricClaimedVerifiedStatus
1STL-SNNaccuracy98.4Unverified
#ModelMetricClaimedVerifiedStatus
1SALSA-FOAER≤20°0.38Unverified