Everybody likes short sentences - A Data Analysis for the Text Complexity DE Challenge 2022

2022-09-01GermEval 2022Unverified0· sign in to hype

Ulf A. Hamster

Unverified — Be the first to reproduce this paper.

Abstract

The German Text Complexity Assessment Shared Task in KONVENS 2022 explores how to predict a complexity score for sentence examples from language learners’ perspective. Our modeling approach for this shared task utilizes off-the-shelf NLP tools for feature engineering and a Random Forest regression model. We identified the text length, or resp. the logarithm of a sentence’s string length, as the most important feature to predict the complexity score. Further analysis showed that the Pearson correlation between text length and complexity score is about ≈ 0.777. A sensitivity analysis on the loss function revealed that semantic SBert features impact the complexity score as well.

Tasks

Feature Engineering regression Sensitivity Sentence

Everybody likes short sentences - A Data Analysis for the Text Complexity DE Challenge 2022

Abstract

Tasks

Reproductions