SPECTRE: An FFT-Based Efficient Drop-In Replacement to Self-Attention for Long Contexts

2025-02-25Code Available2· sign in to hype

Jacob Fein-Ashley, Neelesh Gupta, Rajgopal Kannan, Viktor Prasanna

Code Available — Be the first to reproduce this paper.

Code

github.com/jacobfa/fft
Officialpytorch★ 131

Abstract

Long-context transformers face significant efficiency challenges due to the quadratic cost of self-attention. However, many modern applications-from multi-turn dialogue to high-resolution vision-require contexts spanning tens of thousands of tokens. We introduce SPECTRE, a method that replaces each attention head with a fast real FFT, a content-adaptive spectral gate, and an inverse FFT, reducing per-layer complexity from O(L^2) to O(L L) while preserving the surrounding architecture. We extend this efficiency to autoregressive generation through our Prefix-FFT cache and enhance local feature representation with an optional wavelet module that adds negligible computational overhead. Our experiments demonstrate that SPECTRE operates up to 7 faster than FlashAttention-2 on 128k-token contexts while matching or exceeding baseline performance on PG-19 language modeling and ImageNet-1k classification tasks. SPECTRE achieves these improvements by adding fewer than 6\% parameters to the base model, making hundred-kilotoken context processing feasible on commodity GPUs without specialized hardware.

Tasks

Language Modeling Language Modelling

SPECTRE: An FFT-Based Efficient Drop-In Replacement to Self-Attention for Long Contexts

Code

Abstract

Tasks

Reproductions