PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark
Ziyang Zeng, Dun Zhang, Yu Yan, Xu Sun, Cuiqiaoshu Pan, Yudong Zhou, Yuqing Yang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In real-world documents, the information relevant to a user query may reside anywhere from the beginning to the end. This makes position bias -- a systematic tendency of retrieval models to favor or neglect content based on its location -- a critical concern. Although recent studies have identified such bias, existing analyses focus predominantly on English, fail to disentangle document length from information position, and lack a standardized framework for systematic diagnosis. To address these limitations, we introduce PosIR (Position-Aware Information Retrieval), the first standardized benchmark designed to systematically diagnose position bias in diverse retrieval scenarios. PosIR comprises 310 datasets spanning 10 languages and 31 domains, with relevance tied to precise reference spans. At its methodological core, PosIR employs a length-controlled bucketing strategy that groups queries by positive document length and analyzes positional effects within each bucket. This design strictly isolates position bias from length-induced performance degradation. Extensive experiments on 10 state-of-the-art embedding-based retrieval models reveal that: (1) retrieval performance on PosIR with documents exceeding 1536 tokens correlates poorly with the MMTEB benchmark, exposing limitations of current short-text evaluations; (2) position bias is pervasive in embedding models and even increases with document length, with most models exhibiting primacy bias while certain models show unexpected recency bias; (3) as an exploratory investigation, gradient-based saliency analysis further uncovers two distinct internal mechanisms that correlate with these positional preferences. We hope that PosIR can serve as a valuable diagnostic framework to advance the development of position-robust retrieval systems.