Sound Event Localization and Detection
Given multichannel audio input, a sound event detection and localization (SELD) system outputs a temporal activation track for each of the target sound classes, along with one or more corresponding spatial trajectories when the track indicates activity. This results in a spatio-temporal characterization of the acoustic scene that can be used in a wide range of machine cognition tasks, such as inference on the type of environment, self-localization, navigation without visual input or with occluded targets, tracking of specific types of sound sources, smart-home applications, scene visualization systems, and audio surveillance, among others.
Papers
Showing 1–10 of 65 papers
All datasetsPodcastFillersSTARSS22L3DAS21RWCP Sound Scene DatabaseTAU-NIGENS Spatial Sound Events 2021
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | AVC-FillerNet | event-based F1 score | 92.8 | — | Unverified |
| 2 | VC-FillerNet | event-based F1 score | 71 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Baseline (MIC) | Class-dependent localization error | 32.2 | — | Unverified |
| 2 | Baseline (FOA) | Class-dependent localization error | 29.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DualQSELD-TCN (parallel) | SELD score | 0.32 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | STL-SNN | accuracy | 98.4 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SALSA-FOA | ER≤20° | 0.38 | — | Unverified |