SOTAVerified

Locally Distributed Activation Vectors for Guided Feature Attribution

2022-10-01COLING 2022Unverified0· sign in to hype

Housam K. B. Bashier, Mi-Young Kim, Randy Goebel

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Explaining the predictions of a deep neural network (DNN) is a challenging problem. Many attempts at interpreting those predictions have focused on attribution-based methods, which assess the contributions of individual features to each model prediction. However, attribution-based explanations do not always provide faithful explanations to the target model, e.g., noisy gradients can result in unfaithful feature attribution for back-propagation methods. We present a method to learn explanations-specific representations while constructing deep network models for text classification. These representations can be used to faithfully interpret black-box predictions, i.e., highlighting the most important input features and their role in any particular prediction. We show that learning specific representations improves model interpretability across various tasks, for both qualitative and quantitative evaluations, while preserving predictive performance.

Tasks

Reproductions