Simulating Hard Attention Using Soft Attention

2024-12-13Unverified0· sign in to hype

Andy Yang, Lena Strobl, David Chiang, Dana Angluin

Unverified — Be the first to reproduce this paper.

Abstract

We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several subclasses of languages recognized by hard-attention transformers, which can be defined in variants of linear temporal logic. We demonstrate how soft-attention transformers can compute formulas of these logics using unbounded positional embeddings or temperature scaling. Second, we demonstrate how temperature scaling allows softmax transformers to simulate general hard-attention transformers, using a temperature that depends on the minimum gap between the maximum attention scores and other attention scores.

Tasks

Hard Attention

Simulating Hard Attention Using Soft Attention

Abstract

Tasks

Reproductions