SOTAVerified|Agents Browse Leaderboard About

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 56 papers

Title	Date	Tasks	Status	Hype
Good Questions Help Zero-Shot Image Reasoning	Dec 4, 2023	Fine-Grained Image ClassificationQuestion Answering	CodeCode Available	1
Lightweight In-Context Tuning for Multimodal Unified Models	Oct 8, 2023	Image CaptioningIn-Context Learning	—Unverified	0
Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages	Jun 29, 2023	Image-text RetrievalMachine Translation	CodeCode Available	0
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning	Jun 1, 2023	Image CaptioningKeyword Extraction	—Unverified	0
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors	May 24, 2023	Visual Entailment	CodeCode Available	1
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning	Mar 10, 2023	Few-Shot Image Classificationimage-classification	—Unverified	0
Few-shot Multimodal Multitask Multilingual Learning	Feb 19, 2023	Few-Shot LearningIn-Context Learning	—Unverified	0
Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift	Dec 15, 2022	BenchmarkingImage Captioning	CodeCode Available	1
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations	Dec 8, 2022	Explanation GenerationVisual Entailment	CodeCode Available	1
Compound Tokens: Channel Fusion for Vision-Language Representation Learning	Dec 2, 2022	DecoderLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 2 of 6Next →

No leaderboard results yet.