Counterfactual Inference for Text Classification Debiasing

2021-08-01ACL 2021Code Available1· sign in to hype

Chen Qian, Fuli Feng, Lijie Wen, Chunping Ma, Pengjun Xie

Code Available — Be the first to reproduce this paper.

Code

github.com/qianc62/corsair
OfficialIn paperpytorch★ 82

Abstract

Today's text classifiers inevitably suffer from unintended dataset biases, especially the document-level label bias and word-level keyword bias, which may hurt models' generalization. Many previous studies employed data-level manipulations or model-level balancing mechanisms to recover unbiased distributions and thus prevent models from capturing the two types of biases. Unfortunately, they either suffer from the extra cost of data collection/selection/annotation or need an elaborate design of balancing strategies. Different from traditional factual inference in which debiasing occurs before or during training, counterfactual inference mitigates the influence brought by unintended confounders after training, which can make unbiased decisions with biased observations. Inspired by this, we propose a model-agnostic text classification debiasing framework -- Corsair, which can effectively avoid employing data manipulations or designing balancing mechanisms. Concretely, Corsair first trains a base model on a training set directly, allowing the dataset biases `poison' the trained model. In inference, given a factual input document, Corsair imagines its two counterfactual counterparts to distill and mitigate the two biases captured by the poisonous model. Extensive experiments demonstrate Corsair's effectiveness, generalizability and fairness.

Tasks

Classification counterfactual Counterfactual Inference Fairness text-classification Text Classification

Counterfactual Inference for Text Classification Debiasing

Code

Abstract

Tasks

Reproductions