SOTAVerified

mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing

2016-05-01LREC 2016Unverified0· sign in to hype

Silvio Cordeiro, Carlos Ramisch, Aline Villavicencio

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper presents mwetoolkit+sem: an extension of the mwetoolkit that estimates semantic compositionality scores for multiword expressions (MWEs) based on word embeddings. First, we describe our implementation of vector-space operations working on distributional vectors. The compositionality score is based on the cosine distance between the MWE vector and the composition of the vectors of its member words. Our generic system can handle several types of word embeddings and MWE lists, and may combine individual word representations using several composition techniques. We evaluate our implementation on a dataset of 1042 English noun compounds, comparing different configurations of the underlying word embeddings and word-composition models. We show that our vector-based scores model non-compositionality better than standard association measures such as log-likelihood.

Tasks

Reproductions