Extracting Definienda in Mathematical Scholarly Articles with Transformers

2023-11-21Code Available1· sign in to hype

Shufan Jiang, Pierre Senellart

Code Available — Be the first to reproduce this paper.

Code

github.com/sufianj/def_extraction
OfficialIn papernone★ 0
github.com/PierreSenellart/theoremkb
none★ 22

Abstract

We consider automatically identifying the defined term within a mathematical definition from the text of an academic article. Inspired by the development of transformer-based natural language processing applications, we pose the problem as (a) a token-level classification task using fine-tuned pre-trained transformers; and (b) a question-answering task using a generalist large language model (GPT). We also propose a rule-based approach to build a labeled dataset from the LATEX source of papers. Experimental results show that it is possible to reach high levels of precision and recall using either recent (and expensive) GPT 4 or simpler pre-trained models fine-tuned on our task.

Tasks

Articles Language Modeling Language Modelling Large Language Model Question Answering

Extracting Definienda in Mathematical Scholarly Articles with Transformers

Code

Abstract

Tasks

Reproductions