SOTAVerified

CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups

2017-11-01IJCNLP 2017Unverified0· sign in to hype

Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Complex word identification (CWI) is an important task in text accessibility. However, due to the scarcity of CWI datasets, previous studies have only addressed this problem on Wikipedia sentences and have solely taken into account the needs of non-native English speakers. We collect a new CWI dataset (CWIG3G2) covering three text genres News, WikiNews, and Wikipedia) annotated by both native and non-native English speakers. Unlike previous datasets, we cover single words, as well as complex phrases, and present them for judgment in a paragraph context. We present the first study on cross-genre and cross-group CWI, showing measurable influences in native language and genre types.

Tasks

Reproductions