Visual Entailment Task for Visually-Grounded Language Learning

2018-11-26Code Available0· sign in to hype

Ning Xie, Farley Lai, Derek Doran, Asim Kadav

Code Available — Be the first to reproduce this paper.

Code

github.com/necla-ml/snli-ve
OfficialIn paper★ 0

Abstract

We introduce a new inference task - Visual Entailment (VE) - which differs from traditional Textual Entailment (TE) tasks whereby a premise is defined by an image, rather than a natural language sentence as in TE tasks. A novel dataset SNLI-VE (publicly available at https://github.com/necla-ml/SNLI-VE) is proposed for VE tasks based on the Stanford Natural Language Inference corpus and Flickr30k. We introduce a differentiable architecture called the Explainable Visual Entailment model (EVE) to tackle the VE problem. EVE and several other state-of-the-art visual question answering (VQA) based models are evaluated on the SNLI-VE dataset, facilitating grounded language understanding and providing insights on how modern VQA based models perform.

Tasks

Grounded language learning Natural Language Inference Question Answering Sentence Visual Entailment Visual Question Answering Visual Question Answering (VQA)

Visual Entailment Task for Visually-Grounded Language Learning

Code

Abstract

Tasks

Reproductions