Extracting Definienda in Mathematical Scholarly Articles with Transformers
Shufan Jiang, Pierre Senellart
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/sufianj/def_extractionOfficialIn papernone★ 0
- github.com/PierreSenellart/theoremkbnone★ 22
Abstract
We consider automatically identifying the defined term within a mathematical definition from the text of an academic article. Inspired by the development of transformer-based natural language processing applications, we pose the problem as (a) a token-level classification task using fine-tuned pre-trained transformers; and (b) a question-answering task using a generalist large language model (GPT). We also propose a rule-based approach to build a labeled dataset from the LATEX source of papers. Experimental results show that it is possible to reach high levels of precision and recall using either recent (and expensive) GPT 4 or simpler pre-trained models fine-tuned on our task.