Interpretability Techniques for Deep Learning
Papers
No papers found.
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DAS | Log odds-ratio (pythia-6.9b) | 9.95 | — | Unverified |
| 2 | Linear probe | Log odds-ratio (pythia-6.9b) | 3.42 | — | Unverified |
| 3 | Difference-in-means | Log odds-ratio (pythia-6.9b) | 2.91 | — | Unverified |
| 4 | k-means | Log odds-ratio (pythia-6.9b) | 1.87 | — | Unverified |
| 5 | PCA | Log odds-ratio (pythia-6.9b) | 1.81 | — | Unverified |
| 6 | LDA | Log odds-ratio (pythia-6.9b) | 0.27 | — | Unverified |
| 7 | Random | Log odds-ratio (pythia-6.9b) | 0.01 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | RISE | Insertion AUC score | 0.57 | — | Unverified |
| 2 | HSIC-Attribution | Insertion AUC score | 0.57 | — | Unverified |
| 3 | Kernel SHAP | Insertion AUC score | 0.52 | — | Unverified |
| 4 | LIME | Insertion AUC score | 0.52 | — | Unverified |
| 5 | Saliency | Insertion AUC score | 0.46 | — | Unverified |
| 6 | Grad-CAM | Insertion AUC score | 0.37 | — | Unverified |
| 7 | Integrated Gradients | Insertion AUC score | 0.36 | — | Unverified |