Causal Transformers: Improving the Robustness on Spurious Correlations

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

The fully-connected dependencies in self-attention over-fit spurious correlations and limit the generalization on out-of-distribution data. Pre-trained language models (PLMs) alleviate this problem benefitted from the appreciable counterexamples in large-scale pre-training corpora. However, there is no study to resolve this problem by improving the model structure. We enforced the causal independence mechanism in the self-attention network, which constrains attention mapping topologies (AMGs) as causal structures. To implement it, we defined a smooth loss on the Markov boundary constrained directed acyclic graph (DAG) with the Lagrange duality, and used it to optimize the AMGs towards causal structures. Further, this causal attention network was applied on Transformer (Causal Transformer). The empirical results on two spurious correlation challenging (SCC) datasets, neural machine translation (NMT) and natural language inference (NLI) tasks demonstrated that our Causal Transformer outperforms the state-of-the-art model and improves the out-of-distribution prediction.

Tasks

Machine Translation Natural Language Inference NMT Translation

Causal Transformers: Improving the Robustness on Spurious Correlations

Abstract

Tasks

Reproductions