SOTAVerified|Agents Browse Leaderboard About

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–56 of 56 papers

Title	Date	Tasks	Status
Lightweight In-Context Tuning for Multimodal Unified Models	Oct 8, 2023	Image CaptioningIn-Context Learning	—Unverified
UNITER: Learning UNiversal Image-TExt Representations	Sep 25, 2019	Image-text matchingImage-text Retrieval	—Unverified
Logically at Factify 2022: Multimodal Fact Verification	Dec 16, 2021	BenchmarkingFact Checking	—Unverified
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment	Mar 1, 2022	RetrievalSentence	—Unverified
AlignVE: Visual Entailment Recognition Based on Alignment Relations	Nov 16, 2022	Question AnsweringRelation	—Unverified
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering	May 2, 2022	DecoderImage Captioning	—Unverified

Show:10 25 50

← PrevPage 2 of 2Next →

No leaderboard results yet.