An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence
Douglas L. T. Rohde, Laura M. Gonnerman, and David C. Plaut
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
The lexical semantic system is an important compo- nent of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). An al- ternative approach seeks to model word meanings as high-dimensional vectors, which are derived from the co- occurrence of words in unlabeled text corpora (Landauer & Dumais, 1997; Burgess & Lund, 1997a). This pa- per introduces a new vector-space method for deriving word-meanings from large corpora that was inspired by the HAL and LSA models, but which achieves better and more consistent results in predicting human similarity judgments. We explain the new model, known as COALS, and how it relates to prior methods, and then evaluate the various models on a range of tasks, including a novel set of semantic similarity ratings involving both semantically and morphologically related terms.