Axiomatic Attribution for Deep Networks

2017-03-04ICML 2017Code Available1· sign in to hype

Mukund Sundararajan, Ankur Taly, Qiqi Yan

Code Available — Be the first to reproduce this paper.

Code

github.com/ankurtaly/Attributions
OfficialIn papertf★ 0
github.com/shap/shap
tf★ 25,171
github.com/cdpierse/transformers-interpret
pytorch★ 1,413
github.com/suinleelab/path_explain
tf★ 192
github.com/hannamw/eap-ig
pytorch★ 76
github.com/tleemann/road_evaluation
pytorch★ 24
github.com/garygsw/smooth-taylor
pytorch★ 15
github.com/shaoshanglqy/shap-shapley
tf★ 10
github.com/yeefan1999/Explainable-Health-Prediction-with-Transfer-Learning
tf★ 3
github.com/tomdyer10/fake_news
pytorch★ 0

Abstract

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

Tasks

Explainable artificial intelligence Image Attribution Interpretability Techniques for Deep Learning Interpretable Machine Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CelebA	Integrated Gradients	Insertion AUC score (ArcFace ResNet-101)	0.36	—	Unverified
CUB-200-2011	Integrated Gradients	Insertion AUC score (ResNet-101)	0.04	—	Unverified
VGGFace2	Integrated Gradients	Insertion AUC score (ArcFace ResNet-101)	0.54	—	Unverified

Axiomatic Attribution for Deep Networks

Code

Abstract

Tasks

Benchmark Results

Reproductions