A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

2018-05-14ACL 2018Code Available0· sign in to hype

Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

Code Available — Be the first to reproduce this paper.

Code

github.com/NLPrinceton/ALaCarte
OfficialIn papernone★ 0

Abstract

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

Tasks

Document Classification Domain Adaptation Sentiment Analysis Text Classification Transfer Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CR	byte mLSTM7	Accuracy	90.6	—	Unverified
MPQA	byte mLSTM7	Accuracy	88.8	—	Unverified
MR	byte mLSTM7	Accuracy	86.8	—	Unverified
SST-2 Binary classification	byte mLSTM7	Accuracy	91.7	—	Unverified
SST-5 Fine-grained classification	byte mLSTM7	Accuracy	54.6	—	Unverified

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Code

Abstract

Tasks

Benchmark Results

Reproductions