SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 5156 of 56 papers

TitleStatusHype
Lightweight In-Context Tuning for Multimodal Unified Models0
UNITER: Learning UNiversal Image-TExt Representations0
Logically at Factify 2022: Multimodal Fact Verification0
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment0
AlignVE: Visual Entailment Recognition Based on Alignment Relations0
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.