GNEG: Graph-Based Negative Sampling for word2vec

2018-07-01ACL 2018Unverified0· sign in to hype

Zheng Zhang, Pierre Zweigenbaum

Unverified — Be the first to reproduce this paper.

Abstract

Negative sampling is an important component in word2vec for distributed word representation learning. We hypothesize that taking into account global, corpus-level information and generating a different noise distribution for each target word better satisfies the requirements of negative examples for each training word than the original frequency-based distribution. In this purpose we pre-compute word co-occurrence statistics from the corpus and apply to it network algorithms such as random walk. We test this hypothesis through a set of experiments whose results show that our approach boosts the word analogy task by about 5\% and improves the performance on word similarity tasks by about 1\% compared to the skip-gram negative sampling baseline.

Tasks

Language Modeling Language Modelling Representation Learning Word Similarity

GNEG: Graph-Based Negative Sampling for word2vec

Abstract

Tasks

Reproductions