Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models

2024-11-14Unverified0· sign in to hype

Zi-Xuan Huang, Jia-Wei Chen, Zhi-Peng Zhang, Chia-Mu Yu

Unverified — Be the first to reproduce this paper.

Abstract

Visual prompting (VP) is a new technique that adapts well-trained frozen models for source domain tasks to target domain tasks. This study examines VP's benefits for black-box model-level backdoor detection. The visual prompt in VP maps class subspaces between source and target domains. We identify a misalignment, termed class subspace inconsistency, between clean and poisoned datasets. Based on this, we introduce BProm, a black-box model-level detection method to identify backdoors in suspicious models, if any. BProm leverages the low classification accuracy of prompted models when backdoors are present. Extensive experiments confirm BProm's effectiveness.

Tasks

Visual Prompting

Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models

Abstract

Tasks

Reproductions