NAIL: A Challenging Benchmark for Na\"ive Logical Reasoning

2021-09-29Unverified0· sign in to hype

Xinbo Zhang, Changzhi Sun, Yue Zhang, Lei LI, Hao Zhou

Unverified — Be the first to reproduce this paper.

Abstract

Logical reasoning over natural text is an important capability towards human level intelligence. Existing datasets are either limited and inadequate to train and evaluate logical reasoning capability (e.g., LogiQA and ReClor), or not oriented for logical reasoning (e.g., SQuAD and HotpotQA). In this paper, we focus on a specific category of logical reasoning, named , and propose a new large scale benchmark, named , targeted for learning and evaluating models' capabilities towards . is source from standardized exams such as Chinese National Civil Servants Examination and Law School Admission Test. Furthermore, to collect more data, we propose to imitate the example of standardized exams rather than designing them from scratch. is available in both Chinese and English containing a total of 10,296 * 2 instances. Empirical results show that current state-of-the-art neural models struggle on with very poor accuracy (the best result is 30.10\% for and 36.15\% for Chinese ), while human experts can perform nearly 100\% accuracy. Further results indicate that human imitations can significantly help models learn logic from natural text.

Tasks

Logical Reasoning

NAIL: A Challenging Benchmark for Na\"ive Logical Reasoning

Abstract

Tasks

Reproductions