wav2pos: Sound Source Localization using Masked Autoencoders

2024-08-28Code Available1· sign in to hype

Axel Berg, Jens Gulin, Mark O'Connor, Chuteng Zhou, Karl Åström, Magnus Oskarsson

Code Available — Be the first to reproduce this paper.

Code

github.com/axeber01/wav2pos
Officialpytorch★ 19

Abstract

We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods.

Tasks

Indoor Localization Sound Source Localization

wav2pos: Sound Source Localization using Masked Autoencoders

Code

Abstract

Tasks

Reproductions