Targeted Syntactic Evaluation of Language Models

2018-08-27EMNLP 2018Code Available0· sign in to hype

Rebecca Marvin, Tal Linzen

Code Available — Be the first to reproduce this paper.

Code

github.com/BeckyMarvin/LM_syneval
OfficialIn paperpytorch★ 0
github.com/jennhu/reflexive-anaphor-licensing
pytorch★ 3
github.com/icewing1996/bert-syntax
none★ 0
github.com/yoavg/bert-syntax
none★ 0
github.com/huggingface/bert-syntax
none★ 0

Abstract

We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.

Tasks

CCG Supertagging Language Modeling Language Modelling Sentence

Targeted Syntactic Evaluation of Language Models

Code

Abstract

Tasks

Reproductions