SOTAVerified

Toward More Generalized Malicious URL Detection Models

2022-02-21Code Available0· sign in to hype

YunDa Tsai, Cayon Liow, Yin Sheng Siang, Shou-De Lin

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper reveals a data bias issue that can severely affect the performance while conducting a machine learning model for malicious URL detection. We describe how such bias can be identified using interpretable machine learning techniques, and further argue that such biases naturally exist in the real world security data for training a classification model. We then propose a debiased training strategy that can be applied to most deep-learning based models to alleviate the negative effects from the biased features. The solution is based on the technique of self-supervised adversarial training to train deep neural networks learning invariant embedding from biased data. We conduct a wide range of experiments to demonstrate that the proposed strategy can lead to significantly better generalization capability for both CNN-based and RNN-based detection models.

Tasks

Reproductions