SOTAVerified|Agents Browse Leaderboard About

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 56 papers

Title	Date	Tasks	Status	Hype
How Much Can CLIP Benefit Vision-and-Language Tasks?	Jul 13, 2021	Question AnsweringVision and Language Navigation	CodeCode Available	1
Check It Again: Progressive Visual Question Answering via Visual Entailment	Jun 8, 2021	Question AnsweringVisual Entailment	CodeCode Available	1
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning	Apr 7, 2021	Representation LearningRetrieval	CodeCode Available	1
Large-Scale Adversarial Training for Vision-and-Language Representation Learning	Jun 11, 2020	Image-text RetrievalQuestion Answering	CodeCode Available	1
UNITER: UNiversal Image-TExt Representation Learning	Sep 25, 2019	Image-text matchingImage-text Retrieval	CodeCode Available	1
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks	Jul 29, 2024	Deep LearningDomain Generalization	—Unverified	0
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing	Mar 5, 2024	Multimodal ReasoningSentence	CodeCode Available	0
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks	Feb 27, 2024	Domain GeneralizationImage Captioning	—Unverified	0
p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models	Dec 17, 2023	Image CaptioningQuestion Answering	CodeCode Available	0
Lightweight In-Context Tuning for Multimodal Unified Models	Oct 8, 2023	Image CaptioningIn-Context Learning	—Unverified	0

Show:10 25 50

← PrevPage 3 of 6Next →

No leaderboard results yet.