SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 110 of 56 papers

TitleStatusHype
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven OptimizationCode1
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks0
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based SegmentationCode1
Understanding Figurative Meaning through Explainable Visual EntailmentCode1
MoPE: Mixture of Prompt Experts for Parameter-Efficient and Scalable Multimodal FusionCode1
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint ErasingCode0
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks0
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity RecognitionCode1
p-Laplacian Adaptation for Generative Pre-trained Vision-Language ModelsCode0
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart CaptioningCode1
Show:102550
← PrevPage 1 of 6Next →

No leaderboard results yet.