SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 2130 of 56 papers

TitleStatusHype
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language TasksCode1
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based SegmentationCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
Understanding Figurative Meaning through Explainable Visual EntailmentCode1
Visual Spatial ReasoningCode1
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint ErasingCode0
Prompt Tuning for Generative Multimodal Pretrained ModelsCode0
Visual Entailment: A Novel Task for Fine-Grained Image UnderstandingCode0
Visual Entailment Task for Visually-Grounded Language LearningCode0
Stop Pre-Training: Adapt Visual-Language Models to Unseen LanguagesCode0
Show:102550
← PrevPage 3 of 6Next →

No leaderboard results yet.