SOTAVerified

ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English

2014-05-01LREC 2014Unverified0· sign in to hype

Tim vor der Br{\"u}ck, Alex Mehler, er, Zahurul Islam

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The paper describes a procedure for the automatic generation of a large full-form lexicon of English. We put emphasis on two statistical methods to lexicon extension and adjustment: in terms of a letter-based HMM and in terms of a detector of spelling variants and misspellings. The resulting resource, collexen, is evaluated with respect to two tasks: text categorization and lexical coverage by example of the SUSANNE corpus and the openanc.

Tasks

Reproductions