Room Impulse Responses help attackers to evade Deep Fake Detection

2024-09-23Unverified0· sign in to hype

Hieu-Thi Luong, Duc-Tuan Truong, Kong Aik Lee, Eng Siong Chng

Unverified — Be the first to reproduce this paper.

Abstract

The ASVspoof 2021 benchmark, a widely-used evaluation framework for anti-spoofing, consists of two subsets: Logical Access (LA) and Deepfake (DF), featuring samples with varied coding characteristics and compression artifacts. Notably, the current state-of-the-art (SOTA) system boasts impressive performance, achieving an Equal Error Rate (EER) of 0.87% on the LA subset and 2.58% on the DF. However, benchmark accuracy is no guarantee of robustness in real-world scenarios. This paper investigates the effectiveness of utilizing room impulse responses (RIRs) to enhance fake speech and increase their likelihood of evading fake speech detection systems. Our findings reveal that this simple approach significantly improves the evasion rate, doubling the SOTA system's EER. To counter this type of attack, We augmented training data with a large-scale synthetic/simulated RIR dataset. The results demonstrate significant improvement on both reverberated fake speech and original samples, reducing DF task EER to 2.13%.

Tasks

Face Swapping

Room Impulse Responses help attackers to evade Deep Fake Detection

Abstract

Tasks

Reproductions