SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

2019-05-02NeurIPS 2019Code Available1· sign in to hype

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

Code Available — Be the first to reproduce this paper.

Code

github.com/nyu-mll/jiant
OfficialIn paperpytorch★ 1,674
github.com/google-research/prompt-tuning
jax★ 697
github.com/ledzy/badam
pytorch★ 285
github.com/debugml/incontext_influences
pytorch★ 15
github.com/colinzhaoust/intrinsic_fewshot_hardness
none★ 4
github.com/DataScienceNigeria/SUPERGLUE-from-Facebook-AI-DeepMind-University-of-Washington-and-New-York-University.
none★ 0

Abstract

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.

Tasks

Transfer Learning

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Code

Abstract

Tasks

Reproductions