SOTAVerified

BERT Embeddings for Automatic Readability Assessment

2021-06-15RANLP 2021Code Available0· sign in to hype

Joseph Marvin Imperial

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
OneStopEnglish (Readability Assessment)Logistic RegressionAccuracy (5-fold)0.74Unverified

Reproductions