SOTAVerified

AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection

2022-06-30Code Available1· sign in to hype

Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache, Florin Brad

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning (ML), leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This type of data meets the premise of shifting the input distribution: it covers a large time span (10 years), with naturally occurring changes over time (eg users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning. Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical training which assumes independent and identically distributed data (on average, by up to 3\% for our approach). Dataset and code are available at https://github.com/bit-ml/AnoShift/.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
AnoShiftCOPODROC-AUC FAR50.42Unverified
AnoShiftOC-SVMROC-AUC FAR49.57Unverified
AnoShiftSO-GAALROC-AUC FAR49.35Unverified
AnoShiftECOD Li et al. (2022)ROC-AUC FAR49.19Unverified
AnoShiftLOFROC-AUC FAR34.96Unverified
AnoShiftdeepSVDDROC-AUC FAR34.53Unverified
AnoShiftLUNARROC-AUC FAR28.19Unverified
AnoShiftBERTROC-AUC FAR28.15Unverified
AnoShiftIsoForestROC-AUC FAR27.16Unverified
AnoShiftInternal Contrastive LearningROC-AUC FAR22.45Unverified
AnoShiftAE for anomaliesROC-AUC FAR19.96Unverified

Reproductions