SOTAVerified

MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

2016-05-01LREC 2016Code Available0· sign in to hype

Alex B{\'e}rard, re, Christophe Servan, Olivier Pietquin, Laurent Besacier

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes word2vec's features, paragraph vector (batch and online) and bivec for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.

Tasks

Reproductions