UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers

2026-03-10Unverified0· sign in to hype

Elias Hossain, Shubhashis Roy Dipta, Subash Neupane, Rajib Rana, Ravid Shwartz-Ziv, Ivan Garibay, Niloofar Yousefi

Unverified — Be the first to reproduce this paper.

Abstract

Neural NLP models are often miscalibrated and overconfident, assigning high confidence to incorrect predictions and failing to express uncertainty during internal evidence aggregation. This undermines selective prediction and high-stakes deployment. Post-hoc calibration methods adjust output probabilities but leave internal computation unchanged, while ensemble and Bayesian approaches improve uncertainty at substantial training or storage cost. We propose UAT-LITE, an inference-time framework that makes self-attention uncertainty-aware via Monte Carlo dropout in pretrained transformer classifiers. Unlike output-level calibration (e.g., TS), UAT-LITE injects epistemic uncertainty directly into attention, enabling uncertainty-aware routing during contextualization and token-level diagnostic signals beyond global logit rescaling. Token-level epistemic uncertainty is estimated from stochastic forward passes and used to modulate self-attention during contextualization, without modifying pretrained weights or training objectives. We additionally introduce a layer-wise variance decomposition to diagnose how predictive uncertainty accumulates across transformer depth. Across SQuAD 2.0 answerability, MNLI, and SST-2, UAT-LITE achieves an average relative ECE reduction of approximately 20% compared with a fine-tuned BERT-base baseline while preserving accuracy, and yields more informative uncertainty behavior for selective prediction under distribution shift.

UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers

Abstract

Reproductions