SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 125 of 56 papers

TitleStatusHype
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language TasksCode1
Understanding Figurative Meaning through Explainable Visual EntailmentCode1
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity RecognitionCode1
MoPE: Mixture of Prompt Experts for Parameter-Efficient and Scalable Multimodal FusionCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
Check It Again: Progressive Visual Question Answering via Visual EntailmentCode1
Check It Again:Progressive Visual Question Answering via Visual EntailmentCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
MixGen: A New Multi-Modal Data AugmentationCode1
CoCa: Contrastive Captioners are Image-Text Foundation ModelsCode1
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation LearningCode1
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven OptimizationCode1
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart CaptioningCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based SegmentationCode1
Fine-Grained Visual EntailmentCode1
Good Questions Help Zero-Shot Image ReasoningCode1
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language ExplanationsCode1
How Much Can CLIP Benefit Vision-and-Language Tasks?Code1
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training ModelCode1
I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionCode1
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual MetaphorsCode1
Visual Spatial ReasoningCode1
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.