Malware Knowledge Graph Generation
Sharmishtha Dutta, Nidhi Rastogi, Destin Yee, Chuqiao Gu, Qicheng Ma
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Cyber threat and attack intelligence information are available in non-standard format from heterogeneous sources. Comprehending them and utilizing them for threat intelligence extraction requires engaging security experts. Knowledge graphs enable converting this unstructured information from heterogeneous sources into a structured representation of data and factual knowledge for several downstream tasks such as predicting missing information and future threat trends. Existing large-scale knowledge graphs mainly focus on general classes of entities and relationships between them. Open-source knowledge graphs for the security domain do not exist. To fill this gap, we've built TINKER - a knowledge graph for threat intelligence (Threat INtelligence KnowlEdge gRaph). TINKER is generated using RDF triples describing entities and relations from tokenized unstructured natural language text from 83 threat reports published between 2006-2021. We built TINKER using classes and properties defined by open-source malware ontology and using hand-annotated RDF triples. We also discuss ongoing research and challenges faced while creating TINKER.