Conceptual Edits as Counterfactual Explanations

2022-03-23AAAI-MAKE 2022Code Available0· sign in to hype

Giorgos Filandrianos, Konstantinos Thomas, Edmund Dervakos1, Giorgos Stamou1

Code Available — Be the first to reproduce this paper.

Code

github.com/geofila/Conceptual-Edits-as-Counterfactual-Explanations
In paperpytorch★ 4

Abstract

We propose a framework for generating counterfactual explanations of black-box classifiers, which answer the question “What has to change for this to be classified as X instead of Y?” in terms of given domain knowledge. Specifically, we identify minimal and meaningful “concept edits” which, when applied, change the prediction of a black-box classifier to a desired class. Furthermore, by accumulating multiple counterfactual explanations from interesting regions of a dataset, we propose a method to estimate a "global" counterfactual explanation for that region and a desired target class. We implement algorithms and show results from preliminary experiments employing CLEVR-Hans3 and COCO as datasets. The resulting explanations were useful, and even managed to unintendedly reveal a bias in the classifier’s training set, which was unknown to us.

Tasks

counterfactual Counterfactual Explanation

Conceptual Edits as Counterfactual Explanations

Code

Abstract

Tasks

Reproductions