Axiomatic Attribution for Deep Networks
Mukund Sundararajan, Ankur Taly, Qiqi Yan
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/ankurtaly/AttributionsOfficialIn papertf★ 0
- github.com/shap/shaptf★ 25,171
- github.com/pytorch/captumpytorch★ 5,583
- github.com/cdpierse/transformers-interpretpytorch★ 1,413
- github.com/jankrepl/mildlyoverfittedjax★ 348
- github.com/suinleelab/path_explaintf★ 192
- github.com/hannamw/eap-igpytorch★ 76
- github.com/tleemann/road_evaluationpytorch★ 24
- github.com/garygsw/smooth-taylorpytorch★ 15
- github.com/shaoshanglqy/shap-shapleytf★ 10
Abstract
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| CelebA | Integrated Gradients | Insertion AUC score (ArcFace ResNet-101) | 0.36 | — | Unverified |
| CUB-200-2011 | Integrated Gradients | Insertion AUC score (ResNet-101) | 0.04 | — | Unverified |
| VGGFace2 | Integrated Gradients | Insertion AUC score (ArcFace ResNet-101) | 0.54 | — | Unverified |