WEAC: Word embeddings for anomaly classification from event logs

2018-01-18International Conference on Big Data (Big Data) 2018Unverified0· sign in to hype

Amit Pande, Vishal Ahuja

Unverified — Be the first to reproduce this paper.

Abstract

Abstract: Dramatic progress has been made in the usage of semantic word embeddings for solving word analogy tasks in recent years. Word embeddings or vector representation of words has been the key to many advances in natural language processing. This paper presents a novel application of Word-Embeddings for Anomaly Classification (WEAC), where we detect whether an event log entry is an anomalous one or not. Additionally, WEAC helps us classify the anomaly by identifying the anomalous feature(s) in the event log. For example, an unusual network activity such as a store transaction server logging into dropbox.com would be automatically flagged as anomalous because of the wrong feature associations for entries in the corresponding event log. WEAC works with two training models: Skip-Gram (SG) and Continuous Bag of Words (CBOW). Negative sampling is used to boost the training. The initial results on wikipedia text8 dataset, as well as investigation on enterprise HTTP logs are promising. The model achieved an average detection rate of 65-100% and classification accuracy of 85-100%. The detection rate was superior to state-of-the-art anomaly detection techniques.

Tasks

Anomaly Classification Anomaly Detection Word Embeddings

WEAC: Word embeddings for anomaly classification from event logs

Abstract

Tasks

Reproductions