WEAC: Word embeddings for anomaly classification from event logs
Amit Pande, Vishal Ahuja
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Abstract: Dramatic progress has been made in the usage of semantic word embeddings for solving word analogy tasks in recent years. Word embeddings or vector representation of words has been the key to many advances in natural language processing. This paper presents a novel application of Word-Embeddings for Anomaly Classification (WEAC), where we detect whether an event log entry is an anomalous one or not. Additionally, WEAC helps us classify the anomaly by identifying the anomalous feature(s) in the event log. For example, an unusual network activity such as a store transaction server logging into dropbox.com would be automatically flagged as anomalous because of the wrong feature associations for entries in the corresponding event log. WEAC works with two training models: Skip-Gram (SG) and Continuous Bag of Words (CBOW). Negative sampling is used to boost the training. The initial results on wikipedia text8 dataset, as well as investigation on enterprise HTTP logs are promising. The model achieved an average detection rate of 65-100% and classification accuracy of 85-100%. The detection rate was superior to state-of-the-art anomaly detection techniques.