SOTAVerified

Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

2022-01-16ACL ARR January 2022Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In the age of large transformer language models, linguistic benchmarks play an important role in diagnosing models' abilities and limitations on natural language understanding. However, current benchmarks show some significant shortcomings. In particular, they do not provide insight into how well a language model captures distinct linguistic phenomena essential for language understanding and reasoning. In this paper, we introduce Curriculum, a new large-scale NLI benchmark for evaluation on broad-coverage linguistic phenomena. We show that our benchmark for linguistic phenomena serves as a more difficult challenge for current state-of-the-art models. Our experiments also provide insight into the limitation of existing benchmark datasets. In addition, we find that sequential training on selected linguistic phenomena effectively improves generalizing performance on adversarial NLI under limited training examples.

Tasks

Reproductions