Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

2016-11-04Code Available0· sign in to hype

Philip Blair, Yuval Merhav, Joel Barry

Code Available — Be the first to reproduce this paper.

Code

github.com/belph/wiki-sem-500
OfficialIn papernone★ 0

Abstract

We propose a language-agnostic way of automatically generating sets of semantically similar clusters of entities along with sets of "outlier" elements, which may then be used to perform an intrinsic evaluation of word embeddings in the outlier detection task. We used our methodology to create a gold-standard dataset, which we call WikiSem500, and evaluated multiple state-of-the-art embeddings. The results show a correlation between performance on this dataset and performance on sentiment analysis.

Tasks

Outlier Detection Sentiment Analysis Word Embeddings

Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

Code

Abstract

Tasks

Reproductions