A large annotated corpus for learning natural language inference

2015-08-21EMNLP 2015Code Available1· sign in to hype

Samuel R. Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning

Code Available — Be the first to reproduce this paper.

Code

github.com/kawine/dataset_difficulty
pytorch★ 91
github.com/hpprc/simple-simcse-ja
pytorch★ 69
github.com/songyang0716/NLP/tree/master/natural_language_inference/sentence_encoding_RNN
pytorch★ 0

Abstract

Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

Tasks

Image Captioning Natural Language Inference Sentence

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
SNLI	+ Unigram and bigram features	% Test Accuracy	78.2	—	Unverified
SNLI	100D LSTM encoders	% Test Accuracy	77.6	—	Unverified
SNLI	Unlexicalized features	% Test Accuracy	50.4	—	Unverified

A large annotated corpus for learning natural language inference

Code

Abstract

Tasks

Benchmark Results

Reproductions