Token Level Identification of Multiword Expressions Using Contextual Information

2020-07-01WS 2020Unverified0· sign in to hype

REYHANEH HASHEMPOUR, Aline Villavicencio

Unverified — Be the first to reproduce this paper.

Abstract

Studies on detecting idiomatic expressions mostly focus on discovering potentially idiomatic expressions disregarding the context. However, many idioms like kick the bucket could be idiomatic/literal depending on the context. In this work, we use Context2Vec model to include contextual information. The model learns a generic context embedding function from large corpora, using bidirectional LSTM. We build a simple nearest neighbor classification on Context2Vec which outperforms the popular context representation of average-of-word-embeddings. Through lexical substitution task, we further show that the Context2Vec model is able to place MWEs into distinct `sense'(idiomatic/literal) regions of the embedding space, while traditional word embedding i.e. Skip Gram lacks this ability.

Tasks

Word Embeddings

Token Level Identification of Multiword Expressions Using Contextual Information

Abstract

Tasks

Reproductions