Automatic Labeling for Entity Extraction in Cyber Security

2013-08-22Code Available1· sign in to hype

Robert A. Bridges, Corinne L. Jones, Michael D. Iannacone, Kelly M. Testa, John R. Goodall

Code Available — Be the first to reproduce this paper.

Code

github.com/stucco/auto-labeled-corpus
OfficialIn papernone★ 0
github.com/IS5882/Open-CyKG
tf★ 89
github.com/ShashSec/SMTI_SA
none★ 0

Abstract

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus (750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.

Tasks

Entity Extraction using GAN

Automatic Labeling for Entity Extraction in Cyber Security

Code

Abstract

Tasks

Reproductions