SOTAVerified

How Neural Networks Organize Concepts: Introducing Concept Trajectory Analysis for Deep Learning Interpretability

2025-06-01Independent Research 2025Code Available0· sign in to hype

Andrew Smigaj

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We present Concept Trajectory Analysis (CTA), an interpretability method that tracks how neural networks organize concepts by following their paths through clustered activation spaces across layers. Applying CTA to GPT-2 with 1,228 single-token words revealed that the model organizes language primarily by grammatical function rather than semantic meaning. We found that 48.5% of words converge to grammatical highways where nouns—whether animals, objects, or abstracts—travel together, while maintaining semantic distinctions at finer scales (χ2 = 95.90, p < 0.0001). CTA combines geometric clustering with trajectory tracking to quantify how concepts flow through networks. Our method introduces windowed analysis to identify phase transitions (semantic→grammatical in GPT-2) and leverages LLMs to generate interpretable cluster labels. In medical AI, CTA exposed how a heart disease model stratifies patients through risk pathways, revealing demographic biases (male overprediction in Path 4, 83% male composition). By making neural organization visible and quantifiable, CTA provides actionable insights for model debugging, bias detection, and scientific understanding of deep learning. Our open-source implementation enables researchers to apply CTA to any neural network, advancing interpretable AI across domains.

Tasks

Reproductions