SOTAVerified

Sound Event Localization and Detection

Given multichannel audio input, a sound event detection and localization (SELD) system outputs a temporal activation track for each of the target sound classes, along with one or more corresponding spatial trajectories when the track indicates activity. This results in a spatio-temporal characterization of the acoustic scene that can be used in a wide range of machine cognition tasks, such as inference on the type of environment, self-localization, navigation without visual input or with occluded targets, tracking of specific types of sound sources, smart-home applications, scene visualization systems, and audio surveillance, among others.

Papers

Showing 150 of 65 papers

TitleStatusHype
Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic RoomsCode2
Perception Test: A Diagnostic Benchmark for Multimodal Video ModelsCode2
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and DetectionCode1
Learning Multi-Target TDOA Features for Sound Event Localization and DetectionCode1
MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and DetectionCode1
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapesCode1
Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and DetectionCode1
Fusion of Audio and Visual Embeddings for Sound Event Localization and DetectionCode1
w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-TrainingCode1
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound EventsCode1
AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and DetectionCode1
Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation ChainsCode1
STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound eventsCode1
L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office EnvironmentCode1
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone ArraysCode1
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant TrainingCode1
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and DetectionCode1
What Makes Sound Event Localization and Detection Difficult? Insights from Error AnalysisCode1
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and DetectionCode1
A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and DetectionCode1
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and DetectionCode1
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019Code1
A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and DetectionCode1
SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional NetworksCode1
Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos0
Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling0
CST-former: Multidimensional Attention-based Transformer for Sound Event Localization and Detection in Real Scenes0
Reverberation-based Features for Sound Event Localization and Detection with Distance EstimationCode0
An Experimental Study on Joint Modeling for Sound Event Localization and Detection with Source Distance Estimation0
MVANet: Multi-Stage Video Attention Network for Sound Event Localization and Detection with Source Distance EstimationCode0
Class-Incremental Learning for Sound Event Localization and Detection0
DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection0
Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation0
Real-Time Sound Event Localization and Detection: Deployment Challenges on Edge DevicesCode0
Learning Spatially-Aware Language and Audio Embeddings0
SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation0
Squeeze-and-Excite ResNet-Conformers for Sound Event Localization, Detection, and Distance Estimation for DCASE 2024 Challenge0
Text-Queried Target Sound Event Localization0
Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios0
6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human0
Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality0
BAT: Learning to Reason about Spatial Sounds with Large Language Models0
CST-former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection0
Feature Aggregation in Joint Sound Classification and Localization Neural Networks0
SwG-former: A Sliding-Window Graph Convolutional Network for Simultaneous Spatial-Temporal Information Extraction in Sound Event Localization and Detection0
Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and LocalizationCode0
META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection0
Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection0
Divided spectro-temporal attention for sound event localization and detection in real scenes for DCASE2023 challenge0
CoLoC: Conditioned Localizer and Classifier for Sound Event Localization and Detection0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AVC-FillerNetevent-based F1 score92.8Unverified
2VC-FillerNetevent-based F1 score71Unverified
#ModelMetricClaimedVerifiedStatus
1Baseline (MIC)Class-dependent localization error32.2Unverified
2Baseline (FOA)Class-dependent localization error29.3Unverified
#ModelMetricClaimedVerifiedStatus
1DualQSELD-TCN (parallel)SELD score0.32Unverified
#ModelMetricClaimedVerifiedStatus
1STL-SNNaccuracy98.4Unverified
#ModelMetricClaimedVerifiedStatus
1SALSA-FOAER≤20°0.38Unverified