A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

2025-02-04Code Available0· sign in to hype

Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han

Code Available — Be the first to reproduce this paper.

Code

github.com/academycityl/gali
OfficialIn paperpytorch★ 5

Abstract

Transformer-based Large Language Models (LLMs) struggle to process inputs exceeding their training context window, with performance degrading due to positional out-of-distribution (O.O.D.) that disrupt attention computations. Existing solutions, fine-tuning and training-free methods, are limited by computational inefficiency, attention logit outliers or loss of local positional information. To address this, we propose Greedy Attention Logit Interpolation (GALI), a training-free length extrapolation method that maximizes the utilization of pretrained positional intervals while avoiding attention logit outliers through attention logit interpolation. The result demonstrates that GALI consistently outperforms state-of-the-art training-free methods. Our findings reveal that LLMs interpret positional intervals unevenly within their training context window, suggesting that extrapolating within a smaller positional interval range yields superior results-even for short-context tasks. GALI represents a significant step toward resolving the positional O.O.D. challenge, enabling more reliable long-text understanding in LLMs. Our implementation of GALI, along with the experiments from our paper, is open-sourced at https://github.com/AcademyCityL/GALI.

Tasks

Long-Context Understanding

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
L-Eval	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	59.21	—	Unverified
L-Eval	GALI(Llama3-8b-ins-4k-to-32k)	Average Score	59.1	—	Unverified
L-Eval	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	42.79	—	Unverified
L-Eval	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	42.32	—	Unverified
LongBench	GALI(Llama3-8b-ins-8k-to-32k)	Average Score	45.38	—	Unverified
LongBench	GALI(Llama3-8b-ins-8k-to-16k)	Average Score	45.17	—	Unverified
LongBench	GALI(Llama3-8b-ins-4k-to-16k)	Average Score	46.22	—	Unverified

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Code

Abstract

Tasks

Benchmark Results

Reproductions