SpecNFS: A Challenge Dataset Towards Extracting Formal Models from Natural Language Specifications

2022-06-01LREC 2022Code Available0· sign in to hype

Sayontan Ghosh, Amanpreet Singh, Alex Merenstein, Wei Su, Scott A. Smolka, Erez Zadok, Niranjan Balasubramanian

Code Available — Be the first to reproduce this paper.

Code

github.com/stonybrooknlp/specnfs
OfficialIn papernone★ 0

Abstract

Can NLP assist in building formal models for verifying complex systems? We study this challenge in the context of parsing Network File System (NFS) specifications. We define a semantic-dependency problem over SpecIR, a representation language we introduce to model sentences appearing in NFS specification documents (RFCs) as IF-THEN statements, and present an annotated dataset of 1,198 sentences. We develop and evaluate semantic-dependency parsing systems for this problem. Evaluations show that even when using a state-of-the-art language model, there is significant room for improvement, with the best models achieving an F1 score of only 60.5 and 33.3 in the named-entity-recognition and dependency-link-prediction sub-tasks, respectively. We also release additional unlabeled data and other domain-related texts. Experiments show that these additional resources increase the F1 measure when used for simple domain-adaption and transfer-learning-based approaches, suggesting fruitful directions for further research

Tasks

Dependency Parsing Domain Adaptation Language Modeling Language Modelling Link Prediction named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)Semantic Dependency Parsing Transfer Learning

SpecNFS: A Challenge Dataset Towards Extracting Formal Models from Natural Language Specifications

Code

Abstract

Tasks

Reproductions