SOTAVerified

Synthetic Data Made to Order: The Case of Parsing

2018-10-01EMNLP 2018Code Available0· sign in to hype

Dingquan Wang, Jason Eisner

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

To approximately parse an unfamiliar language, it helps to have a treebank of a similar language. But what if the closest available treebank still has the wrong word order? We show how to (stochastically) permute the constituents of an existing dependency treebank so that its surface part-of-speech statistics approximately match those of the target language. The parameters of the permutation model can be evaluated for quality by dynamic programming and tuned by gradient descent (up to a local optimum). This optimization procedure yields trees for a new artificial language that resembles the target language. We show that delexicalized parsers for the target language can be successfully trained using such ``made to order'' artificial languages.

Tasks

Reproductions