SOTAVerified

A corpus for Automatic Article Analysis

2022-09-01CLIB 2022Unverified0· sign in to hype

Elena Callegari, Desara Xhura

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We describe the structure and creation of the SageWrite corpus. This is a manually annotated corpus created to support automatic language generation and automatic quality assessment of academic articles. The corpus currently contains annotations for 100 excerpts taken from various scientific articles. For each of these excerpts, the corpus contains (i) a draft version of the excerpt (ii) annotations that reflect the stylistic and linguistics merits of the excerpt, such as whether or not the text is clearly structured. The SageWrite corpus is the first corpus for the fine-tuning of text-generation algorithms that specifically addresses academic writing.

Tasks

Reproductions