Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models
Zi-Xuan Huang, Jia-Wei Chen, Zhi-Peng Zhang, Chia-Mu Yu
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Visual prompting (VP) is a new technique that adapts well-trained frozen models for source domain tasks to target domain tasks. This study examines VP's benefits for black-box model-level backdoor detection. The visual prompt in VP maps class subspaces between source and target domains. We identify a misalignment, termed class subspace inconsistency, between clean and poisoned datasets. Based on this, we introduce BProm, a black-box model-level detection method to identify backdoors in suspicious models, if any. BProm leverages the low classification accuracy of prompted models when backdoors are present. Extensive experiments confirm BProm's effectiveness.