Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

2017-01-01TACL 2017Unverified0· sign in to hype

G{\'a}bor Berend

Unverified — Be the first to reproduce this paper.

Abstract

In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8\% of its average POS tagging accuracy when trained at 1.2\% of the total available training data, i.e. 150 sentences per language.

Tasks

Feature Engineering named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)Part-Of-Speech Tagging POS POS Tagging Word Embeddings

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

Abstract

Tasks

Reproductions