Provable Defense against Backdoor Policies in Reinforcement Learning

2022-11-18Code Available0· sign in to hype

Shubham Kumar Bharti, Xuezhou Zhang, Adish Singla, Xiaojin Zhu

Code Available — Be the first to reproduce this paper.

Code

github.com/skbharti/provable-defense-in-rl
OfficialIn paperpytorch★ 9

Abstract

We propose a provable defense mechanism against backdoor policies in reinforcement learning under subspace trigger assumption. A backdoor policy is a security threat where an adversary publishes a seemingly well-behaved policy which in fact allows hidden triggers. During deployment, the adversary can modify observed states in a particular way to trigger unexpected actions and harm the agent. We assume the agent does not have the resources to re-train a good policy. Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a 'safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment. Our sanitized policy achieves approximate optimality in the presence of triggers, provided the number of clean interactions is O(D(1-)^4 ^2) where is the discounting factor and D is the dimension of state space. Empirically, we show that our sanitization defense performs well on two Atari game environments.

Tasks

reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Provable Defense against Backdoor Policies in Reinforcement Learning

Code

Abstract

Tasks

Reproductions