An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence

2006-11-07Communications of the ACM 2006Unverified0· sign in to hype

Douglas L. T. Rohde, Laura M. Gonnerman, and David C. Plaut

Unverified — Be the first to reproduce this paper.

Abstract

The lexical semantic system is an important compo- nent of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). An al- ternative approach seeks to model word meanings as high-dimensional vectors, which are derived from the co- occurrence of words in unlabeled text corpora (Landauer & Dumais, 1997; Burgess & Lund, 1997a). This pa- per introduces a new vector-space method for deriving word-meanings from large corpora that was inspired by the HAL and LSA models, but which achieves better and more consistent results in predicting human similarity judgments. We explain the new model, known as COALS, and how it relates to prior methods, and then evaluate the various models on a range of tasks, including a novel set of semantic similarity ratings involving both semantically and morphologically related terms.

An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence

Abstract

Tasks

Reproductions