SOTAVerified|Agents Browse Leaderboard About

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–56 of 56 papers

Title	Date	Tasks	Status
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning	Mar 10, 2023	Few-Shot Image Classificationimage-classification	—Unverified
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation	Dec 10, 2021	Image-text matchingImage-text Retrieval	—Unverified
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning	Jun 1, 2023	Image CaptioningKeyword Extraction	—Unverified
Lightweight In-Context Tuning for Multimodal Unified Models	Oct 8, 2023	Image CaptioningIn-Context Learning	—Unverified
UNITER: Learning UNiversal Image-TExt Representations	Sep 25, 2019	Image-text matchingImage-text Retrieval	—Unverified
Logically at Factify 2022: Multimodal Fact Verification	Dec 16, 2021	BenchmarkingFact Checking	—Unverified

Show:10 25 50

← PrevPage 6 of 6Next →

No leaderboard results yet.