Corpus REDEWIEDERGABE

2020-05-01LREC 2020Unverified0· sign in to hype

Annelen Brunner, Stefan Engelberg, Fotis Jannidis, Ngoc Duyen Tanja Tu, Lukas Weimer

Unverified — Be the first to reproduce this paper.

Abstract

This article presents corpus REDEWIEDERGABE, a German-language historical corpus with detailed annotations for speech, thought and writing representation (ST\&WR). With approximately 490,000 tokens, it is the largest resource of its kind. It can be used to answer literary and linguistic research questions and serve as training material for machine learning. This paper describes the composition of the corpus and the annotation structure, discusses some methodological decisions and gives basic statistics about the forms of ST\&WR found in this corpus.

Tasks

BIG-bench Machine Learning

Corpus REDEWIEDERGABE

Abstract

Tasks

Reproductions