Unveiling Language Skills via Path-Level Circuit Discovery

2024-10-02Code Available0· sign in to hype

Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang

Code Available — Be the first to reproduce this paper.

Code

github.com/zodiark-ch/language-skill-of-llms
OfficialIn paperpytorch★ 7

Abstract

Circuit discovery with edge-level ablation has become a foundational framework for mechanism interpretability of language models. However, its focus on individual edges often overlooks the sequential, path-level causal relationships that underpin complex behaviors, thus potentially leading to misleading or incomplete circuit discoveries. To address this issue, we propose a novel path-level circuit discovery framework capturing how behaviors emerge through interconnected linear chain and build towards complex behaviors. Our framework is constructed upon a fully-disentangled linear combinations of ``memory circuits'' decomposed from the original model. To discover functional circuit paths, we leverage a 2-step pruning strategy by first reducing the computational graph to a faithful and minimal subgraph and then applying causal mediation to identify common paths of a specific skill, termed as skill paths. In contrast to circuit graph from existing works, we focus on the complete paths of a generic skill rather than on the fine-grained responses to individual components of the input. To demonstrate this, we explore three generic language skills, namely Previous Token Skill, Induction Skill and In-Context Learning Skill using our framework and provide more compelling evidence to substantiate stratification and inclusiveness of these skills.

Tasks

Disentanglement In-Context Learning Language Modelling

Unveiling Language Skills via Path-Level Circuit Discovery

Code

Abstract

Tasks

Reproductions