SOTAVerified

A Universal Learnable Audio Frontend

2021-01-01ICLR 2021Unverified0· sign in to hype

Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have lived through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental limitations of handmade representations. In this work we show that we can train a single, universal learnable frontend that outperforms mel-filterbanks over a wide range of audio domains, including speech, music, audio events, and animal sounds, providing an unprecedented general purpose learned frontend for audio. To do so, we introduce a new principled, lightweight, fully learnable architecture that can be used as a drop-in replacement of mel-filterbanks. Our system learns all operations of audio features extraction, from filtering to pooling, compression and normalization, and can be integrated into any neural network at a negligible parameter cost. We perform multi-task training on 8 diverse audio classification tasks, and show consistent improvements of our model over mel-filterbanks and previous learnable alternatives. Moreover, our system is competitive with the current state-of-the-art learnable frontend on Audioset, with orders of magnitude fewer parameters.

Tasks

Reproductions