SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 1120 of 56 papers

TitleStatusHype
I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionCode1
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training ModelCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
MixGen: A New Multi-Modal Data AugmentationCode1
CoCa: Contrastive Captioners are Image-Text Foundation ModelsCode1
Visual Spatial ReasoningCode1
Fine-Grained Visual EntailmentCode1
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language TasksCode1
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
Check It Again:Progressive Visual Question Answering via Visual EntailmentCode1
Show:102550
← PrevPage 2 of 6Next →

No leaderboard results yet.