MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
Alex B{\'e}rard, re, Christophe Servan, Olivier Pietquin, Laurent Besacier
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/eske/multivecOfficialIn papernone★ 0
Abstract
We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes word2vec's features, paragraph vector (batch and online) and bivec for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.