SOTAVerified

An Evaluation of Binary Comparative Lexical Complexity Models

2022-07-01NAACL (BEA) 2022Unverified0· sign in to hype

Kai North, Marcos Zampieri, Matthew Shardlow

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Identifying complex words in texts is an important first step in text simplification (TS) systems. In this paper, we investigate the performance of binary comparative Lexical Complexity Prediction (LCP) models applied to a popular benchmark dataset — the CompLex 2.0 dataset used in SemEval-2021 Task 1. With the data from CompLex 2.0, we create a new dataset contain 1,940 sentences referred to as CompLex-BC. Using CompLex-BC, we train multiple models to differentiate which of two target words is more or less complex in the same sentence. A linear SVM model achieved the best performance in our experiments with an F1-score of 0.86.

Tasks

Reproductions