A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment
Raanan Y. Rohekar, Yaniv Gurwicz, Sungduk Yu, Estelle Aflalo, Vasudev Lal
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Do generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learn a world model from which a sequence is generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT, and suggesting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences and present a confidence score. Empirical evaluation is conducted in a controlled environment using the setup and rules of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, is tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases for which the GPT model generates illegal moves it also fails to capture any causal structure.