SOTAVerified

Exploring the Robustness of Language Models for Tabular Question Answering via Attention Analysis

2024-06-18Unverified0· sign in to hype

Kushal Raj Bhandari, Sixue Xing, Soham Dan, Jianxi Gao

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Large Language Models (LLMs), already shown to ace various text comprehension tasks, have also remarkably been shown to tackle table comprehension tasks without specific training. Building on earlier studies of LLMs for tabular tasks, we probe how in-context learning (ICL), model scale, instruction tuning, and domain bias affect Tabular QA (TQA) robustness by testing LLMs, under diverse augmentations and perturbations, on diverse domains: Wikipedia-based WTQ, financial TAT-QA, and scientific SCITAB. Although instruction tuning and larger, newer LLMs deliver stronger, more robust TQA performance, data contamination and reliability issues, especially on WTQ, remain unresolved. Through an in-depth attention analysis, we reveal a strong correlation between perturbation-induced shifts in attention dispersion and the drops in performance, with sensitivity peaking in the model's middle layers. We highlight the need for improved interpretable methodologies to develop more reliable LLMs for table comprehension.

Tasks

Reproductions