TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

2023-12-16Unverified0· sign in to hype

Mengxin Zheng, Jiaqi Xue, Xun Chen, Yanshan Wang, Qian Lou, Lei Jiang

Unverified — Be the first to reproduce this paper.

Abstract

Prompt tuning is one of the most effective solutions to adapting a fixed pre-trained language model (PLM) for various downstream tasks, especially with only a few input samples. However, the security issues, e.g., Trojan attacks, of prompt tuning on a few data samples are not well-studied. Transferring established data poisoning attacks directly to few-shot prompt tuning presents multiple challenges. One significant issue is the poisoned imbalance issue, where non-target class samples are added to the target class, resulting in a greater number of target-class samples compared to non-target class. While this issue is not critical in regular tuning, it significantly hampers the few-shot prompt tuning, making it difficult to simultaneously achieve a high attack success rate (ASR) and maintain clean data accuracy (CDA). Additionally, few-shot prompting is prone to overfitting in terms of both ASR and CDA. In this paper, we introduce TrojFSP, a method designed to address the challenges. To solve the poisoned imbalance issue, we develop a Target-Class Shrink (TC-Shrink) technique, which aims to equalize the number of poisoning samples. To combat overfitting, we employ a Selective Token Poisoning technique to boost attack performance. Furthermore, we introduce a Trojan-Trigger Attention objective function to amplify the attention of the poisoned trojan prompt on triggers. Experiments show that our TrojFSP achieves an ASR of over 99\% while maintaining negligible decreases in CDA across various PLMs and datasets.

Tasks

Data Poisoning Language Modelling

TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

Abstract

Tasks

Reproductions