| NoPE: The Counting Power of Transformers with No Positional Encodings | May 16, 2025 | Hard Attention | —Unverified | 0 | 0 |
| Achieving Explainability in a Visual Hard Attention Model through Content Prediction | Jan 1, 2021 | Hard Attentionimage-classification | —Unverified | 0 | 0 |
| A Differentiable Self-disambiguated Sense Embedding Model via Scaled Gumbel Softmax | Sep 27, 2018 | Hard AttentionSentence | —Unverified | 0 | 0 |
| AMR Parsing with Action-Pointer Transformer | Nov 24, 2020 | Abstract Meaning RepresentationAMR Parsing | —Unverified | 0 | 0 |
| An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing | Jun 13, 2017 | Automatic Post-EditingHard Attention | —Unverified | 0 | 0 |
| A study of latent monotonic attention variants | Mar 30, 2021 | Hard Attentionspeech-recognition | —Unverified | 0 | 0 |
| AttentionDrop: A Novel Regularization Method for Transformer Models | Apr 16, 2025 | Hard Attention | —Unverified | 0 | 0 |
| Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits | Aug 6, 2023 | Hard Attention | —Unverified | 0 | 0 |
| Characterizing the Expressivity of Transformer Language Models | May 29, 2025 | Hard Attention | —Unverified | 0 | 0 |
| CLAWS: Contrastive Learning with hard Attention and Weak Supervision | Dec 1, 2021 | Anomaly DetectionContrastive Learning | —Unverified | 0 | 0 |