ECINN: Efficient Counterfactuals from Invertible Neural Networks

2021-03-25Code Available0· sign in to hype

Frederik Hvilshøj, Alexandros Iosifidis, Ira Assent

Code Available — Be the first to reproduce this paper.

Code

github.com/fhvilshoj/ecinn
pytorch★ 8

Abstract

Counterfactual examples identify how inputs can be altered to change the predicted class of a classifier, thus opening up the black-box nature of, e.g., deep neural networks. We propose a method, ECINN, that utilizes the generative capacities of invertible neural networks for image classification to generate counterfactual examples efficiently. In contrast to competing methods that sometimes need a thousand evaluations or more of the classifier, ECINN has a closed-form expression and generates a counterfactual in the time of only two evaluations. Arguably, the main challenge of generating counterfactual examples is to alter only input features that affect the predicted outcome, i.e., class-dependent features. Our experiments demonstrate how ECINN alters class-dependent image regions to change the perceptual and predicted class of the counterfactuals. Additionally, we extend ECINN to also produce heatmaps (ECINNh) for easy inspection of, e.g., pairwise class-dependent changes in the generated counterfactual examples. Experimentally, we find that ECINNh outperforms established methods that generate heatmap-based explanations.

Tasks

counterfactual image-classification Image Classification

ECINN: Efficient Counterfactuals from Invertible Neural Networks

Code

Abstract

Tasks

Reproductions