AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection

2022-06-30Code Available1· sign in to hype

Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache, Florin Brad

Code Available — Be the first to reproduce this paper.

Code

github.com/bit-ml/anoshift
OfficialIn paperpytorch★ 60

Abstract

Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning (ML), leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This type of data meets the premise of shifting the input distribution: it covers a large time span (10 years), with naturally occurring changes over time (eg users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning. Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical training which assumes independent and identically distributed data (on average, by up to 3\% for our approach). Dataset and code are available at https://github.com/bit-ml/AnoShift/.

Tasks

Intrusion Detection Network Intrusion Detection Unsupervised Anomaly Detection

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
AnoShift	COPOD	ROC-AUC FAR	50.42	—	Unverified
AnoShift	OC-SVM	ROC-AUC FAR	49.57	—	Unverified
AnoShift	SO-GAAL	ROC-AUC FAR	49.35	—	Unverified
AnoShift	ECOD Li et al. (2022)	ROC-AUC FAR	49.19	—	Unverified
AnoShift	LOF	ROC-AUC FAR	34.96	—	Unverified
AnoShift	deepSVDD	ROC-AUC FAR	34.53	—	Unverified
AnoShift	LUNAR	ROC-AUC FAR	28.19	—	Unverified
AnoShift	BERT	ROC-AUC FAR	28.15	—	Unverified
AnoShift	IsoForest	ROC-AUC FAR	27.16	—	Unverified
AnoShift	Internal Contrastive Learning	ROC-AUC FAR	22.45	—	Unverified
AnoShift	AE for anomalies	ROC-AUC FAR	19.96	—	Unverified

AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection

Code

Abstract

Tasks

Benchmark Results

Reproductions