SOTAVerified

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

2022-07-12Code Available1· sign in to hype

Jan Schlüter, Gerald Gutenbrunner

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy Normalization (PCEN), has shown promising results, but is computationally expensive. With inhomogeneous convolution kernel sizes and strides, and by replacing PCEN with better parallelizable operations, we can reach similar results more efficiently. In experiments on six audio classification tasks, our frontend matches the accuracy of LEAF at 3% of the cost, but both fail to consistently outperform a fixed mel filterbank. The quest for learnable audio frontends is not solved.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
BirdCLEF 2021LEAFAccuracy42.3Unverified
BirdCLEF 2021melspectAccuracy39.9Unverified
BirdCLEF 2021EfficientLEAF (8s)Accuracy72.2Unverified
BirdCLEF 2021EfficientLEAFAccuracy42.9Unverified
CREMA-DEfficientLEAFAccuracy60.2Unverified
CREMA-DmelspectAccuracy58.8Unverified
CREMA-DLEAFAccuracy50.2Unverified
Speech CommandsEfficientLEAFAccuracy95.2Unverified
Speech CommandsLEAFAccuracy95.1Unverified
Speech CommandsmelspectAccuracy95.1Unverified

Reproductions