Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations
2016-06-02ACL 2016Code Available0· sign in to hype
Alexandre Salle, Marco Idiart, Aline Villavicencio
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/alexandres/lexvecOfficialIn papernone★ 0
Abstract
In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.