Probing the topology of the space of tokens with structured prompts

2025-03-19Unverified0· sign in to hype

Michael Robinson, Sourya Dey, Taisa Kushner

Unverified — Be the first to reproduce this paper.

Abstract

This article presents a general and flexible method for prompting a large language model (LLM) to reveal its (hidden) token input embedding up to homeomorphism. Moreover, this article provides strong theoretical justification -- a mathematical proof for generic LLMs -- for why this method should be expected to work. With this method in hand, we demonstrate its effectiveness by recovering the token subspace of Llemma-7B. The results of this paper apply not only to LLMs but also to general nonlinear autoregressive processes.

Tasks

Language Modeling Language Modelling Large Language Model

Probing the topology of the space of tokens with structured prompts

Abstract

Tasks

Reproductions