Geometry without Position? When Positional Embeddings Help and Hurt Spatial Reasoning

2026-01-29Code Available0· sign in to hype

Jian Shi, Michael Birsak, Wenqing Cui, Zhenyu Li, Peter Wonka

Code Available — Be the first to reproduce this paper.

Code

github.com/shijianjian/vit-geometry-probes
OfficialIn paper★ 0

Abstract

This paper revisits the role of positional embeddings (PEs) within vision transformers (ViTs) from a geometric perspective. We show that PEs are not mere token indices but effectively function as geometric priors that shape the spatial structure of the representation. We introduce token-level diagnostics that measure how multi-view geometric consistency in ViT representation depends on consitent PEs. Through extensive experiments on 14 foundation ViT models, we reveal how PEs influence multi-view geometry and spatial reasoning. Our findings clarify the role of PEs as a causal mechanism that governs spatial structure in ViT representations. Our code is provided in https://github.com/shijianjian/vit-geometry-probes

Geometry without Position? When Positional Embeddings Help and Hurt Spatial Reasoning

Code

Abstract

Reproductions