SOTAVerified

Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences

2020-05-01LREC 2020Unverified0· sign in to hype

Evelina Rennes

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Parallel monolingual resources are imperative for data-driven sentence simplification research. We present the work of aligning, at the sentence level, a corpus of all Swedish public authorities and municipalities web texts in standard and simple Swedish. We compare the performance of three alignment algorithms used for similar work in English (Average Alignment, Maximum Alignment, and Hungarian Alignment), and the best-performing algorithm is used to create a resource of 15,433 unique sentence pairs. We evaluate the resulting corpus using a set of features that has proven to predict text complexity of Swedish texts. The results show that the sentences of the simple sub-corpus are indeed less complex than the sentences of the standard part of the corpus, according to many of the text complexity measures.

Tasks

Reproductions