Improved CCG Parsing with Semi-supervised Supertagging

2014-01-01TACL 2014Unverified0· sign in to hype

Mike Lewis, Mark Steedman

Unverified — Be the first to reproduce this paper.

Abstract

Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal. We show how a state-of-the-art CCG parser can be enhanced, by predicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to accurately assign lexical categories without depending on a POS-tagger. Our approach leads to substantial improvements in dependency parsing results over the standard supervised CCG parser when evaluated on Wall Street Journal (0.8\%), Wikipedia (1.8\%) and biomedical (3.4\%) text. We compare the performance of two recently proposed approaches for classification using a wide variety of word embeddings. We also give a detailed error analysis demonstrating where using embeddings outperforms traditional feature sets, and showing how including POS features can decrease accuracy.

Tasks

CCG Supertagging Dependency Parsing Natural Language Inference POS Question Answering Structured Prediction Word Embeddings

Improved CCG Parsing with Semi-supervised Supertagging

Abstract

Tasks

Reproductions