Audio Explanation Synthesis with Generative Foundation Models

2024-10-10Code Available0· sign in to hype

Alican Akman, Qiyang Sun, Björn W. Schuller

Code Available — Be the first to reproduce this paper.

Code

github.com/glam-imperial/AudioXgen
Officialpytorch★ 4

Abstract

The increasing success of audio foundation models across various tasks has led to a growing need for improved interpretability to understand their intricate decision-making processes better. Existing methods primarily focus on explaining these models by attributing importance to elements within the input space based on their influence on the final decision. In this paper, we introduce a novel audio explanation method that capitalises on the generative capacity of audio foundation models. Our method leverages the intrinsic representational power of the embedding space within these models by integrating established feature attribution techniques to identify significant features in this space. The method then generates listenable audio explanations by prioritising the most important features. Through rigorous benchmarking against standard datasets, including keyword spotting and speech emotion recognition, our model demonstrates its efficacy in producing audio explanations.

Tasks

Benchmarking Decision Making Emotion Recognition Keyword Spotting Speech Emotion Recognition

Audio Explanation Synthesis with Generative Foundation Models

Code

Abstract

Tasks

Reproductions