Can Pre-trained Models Really Generate Single-Step Textual Entailment?

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

We investigate the task of generating textual entailment (GTE). Different from prior works on recognizing textual entailment, also known as NLI, GTE requires the models with deeper reasoning capabilities - generating entailment from premises rather than making prediction on given premises and the entailment. We argue that existing adapted datasets are limited and inadequate to train and evaluate human-like reasoning in the GTE. In this paper, we propose a new large-scale benchmark, named , targeted for learning and evaluating models' capabilities towards RTE. consists of 15k instances with each containing a pair of premise statements and a human-annotated entailment. It is constructed by first retrieving instances from a knowledge base, and then augmenting each instance with several complementary instances by 7 manually crafted transformations. We demonstrate that even extensively fine-tuned pre-trained models perform poorly on . The best generator models can only generate valid textual entailment 59.1\% of times. Further, to motivate future advances, we provide detailed analysis to show significant gaps between baselines and human performance.

Tasks

Natural Language Inference RTE valid

Can Pre-trained Models Really Generate Single-Step Textual Entailment?

Abstract

Tasks

Reproductions