Multilingual and Cross-Lingual Complex Word Identification

2017-09-01RANLP 2017Unverified0· sign in to hype

Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann

Unverified — Be the first to reproduce this paper.

Abstract

Complex Word Identification (CWI) is an important task in lexical simplification and text accessibility. Due to the lack of CWI datasets, previous works largely depend on Simple English Wikipedia and edit histories for obtaining `gold standard' annotations, which are of doubtable quality, and limited only to English. We collect complex words/phrases (CP) for English, German and Spanish, annotated by both native and non-native speakers, and propose language independent features that can be used to train multilingual and cross-lingual CWI models. We show that the performance of cross-lingual CWI systems (using a model trained on one language and applying it on the other languages) is comparable to the performance of monolingual CWI systems.

Tasks

Complex Word Identification Lexical Simplification

Multilingual and Cross-Lingual Complex Word Identification

Abstract

Tasks

Reproductions