Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline

2014-05-01LREC 2014Unverified0· sign in to hype

Matthew Shardlow

Unverified — Be the first to reproduce this paper.

Abstract

Lexical simplification is the task of automatically reducing the complexity of a text by identifying difficult words and replacing them with simpler alternatives. Whilst this is a valuable application of natural language generation, rudimentary lexical simplification systems suffer from a high error rate which often results in nonsensical, non-simple text. This paper seeks to characterise and quantify the errors which occur in a typical baseline lexical simplification system. We expose 6 distinct categories of error and propose a classification scheme for these. We also quantify these errors for a moderate size corpus, showing the magnitude of each error type. We find that for 183 identified simplification instances, only 19 (10.38\%) result in a valid simplification, with the rest causing errors of varying gravity.

Tasks

Lexical Simplification Text Generation Text Simplification valid

Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline

Abstract

Tasks

Reproductions