SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 5156 of 56 papers

TitleStatusHype
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning0
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation0
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning0
Lightweight In-Context Tuning for Multimodal Unified Models0
UNITER: Learning UNiversal Image-TExt Representations0
Logically at Factify 2022: Multimodal Fact Verification0
Show:102550
← PrevPage 6 of 6Next →

No leaderboard results yet.