SOTAVerified

MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions

2019-09-26WS 2019Unverified0· sign in to hype

Christina Niklaus, Andre Freitas, Siegfried Handschuh

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.

Tasks

Reproductions