Understanding Task Representations in Neural Networks via Bayesian Ablation

2025-05-19Unverified0· sign in to hype

Andrew Nam, Declan Campbell, Thomas Griffiths, Jonathan Cohen, Sarah-Jane Leslie

Unverified — Be the first to reproduce this paper.

Abstract

Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.

Tasks

Bayesian Inference

Understanding Task Representations in Neural Networks via Bayesian Ablation

Abstract

Tasks

Reproductions