SOTAVerified

Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings

2023-10-10Code Available0· sign in to hype

Timothee Mickus, Raúl Vázquez

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

A recent body of work has demonstrated that Transformer embeddings can be linearly decomposed into well-defined sums of factors, that can in turn be related to specific network inputs or components. There is however still a dearth of work studying whether these mathematical reformulations are empirically meaningful. In the present work, we study representations from machine-translation decoders using two of such embedding decomposition methods. Our results indicate that, while decomposition-derived indicators effectively correlate with model performance, variation across different runs suggests a more nuanced take on this question. The high variability of our measurements indicate that geometry reflects model-specific characteristics more than it does sentence-specific computations, and that similar training conditions do not guarantee similar vector spaces.

Tasks

Reproductions