SOTAVerified

A Corpus for Automatic Readability Assessment and Text Simplification of German

2019-09-19LREC 2020Unverified0· sign in to hype

Alessia Battisti, Sarah Ebling

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification of German. The corpus is compiled from web sources and consists of approximately 211,000 sentences. As a novel contribution, it contains information on text structure, typography, and images, which can be exploited as part of machine learning approaches to readability assessment and text simplification. The focus of this publication is on representing such information as an extension to an existing corpus standard.

Tasks

Reproductions