SOTAVerified

Uncovering hidden geometry in Transformers via disentangling position and context

2023-10-07Code Available0· sign in to hype

Jiajun Song, Yiqiao Zhong

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Transformers are widely used to extract semantic meanings from input tokens, yet they usually operate as black-box models. In this paper, we present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components. For any layer, embedding vectors of input sequence samples are represented by a tensor h R^C T d. Given embedding vector h_c,t R^d at sequence position t T in a sequence (or context) c C, extracting the mean effects yields the decomposition \[ h_c,t = + pos_t + ctx_c + resid_c,t \] where is the global mean vector, pos_t and ctx_c are the mean vectors across contexts and across positions respectively, and resid_c,t is the residual vector. For popular transformer architectures and diverse text datasets, empirically we find pervasive mathematical structure: (1) (pos_t)_t forms a low-dimensional, continuous, and often spiral shape across layers, (2) (ctx_c)_c shows clear cluster structure that falls into context topics, and (3) (pos_t)_t and (ctx_c)_c are mutually nearly orthogonal. We argue that smoothness is pervasive and beneficial to transformers trained on languages, and our decomposition leads to improved model interpretability.

Tasks

Reproductions