Exact Sequence Classification with Hardmax Transformers

2025-02-04Unverified0· sign in to hype

Albert Alcalde, Giovanni Fantuzzi, Enrique Zuazua

Unverified — Be the first to reproduce this paper.

Abstract

We prove that hardmax attention transformers perfectly classify datasets of N labeled sequences in R^d, d 2. Specifically, given N sequences with an arbitrary but finite length in R^d, we construct a transformer with O(N) blocks and O(Nd) parameters perfectly classifying this dataset. Our construction achieves the best complexity estimate to date, independent of the length of the sequences, by innovatively alternating feed-forward and self-attention layers and by capitalizing on the clustering effect inherent to the latter. Our novel constructive method also uses low-rank parameter matrices within the attention mechanism, a common practice in real-life transformer implementations. Consequently, our analysis holds twofold significance: it substantially advances the mathematical theory of transformers and it rigorously justifies their exceptional real-world performance in sequence classification tasks.

Tasks

Classification

Exact Sequence Classification with Hardmax Transformers

Abstract

Tasks

Reproductions