SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 2650 of 56 papers

TitleStatusHype
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint ErasingCode0
Prompt Tuning for Generative Multimodal Pretrained ModelsCode0
Visual Entailment: A Novel Task for Fine-Grained Image UnderstandingCode0
Visual Entailment Task for Visually-Grounded Language LearningCode0
Stop Pre-Training: Adapt Visual-Language Models to Unseen LanguagesCode0
p-Laplacian Adaptation for Generative Pre-trained Vision-Language ModelsCode0
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language ExplanationsCode0
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning FrameworkCode0
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks0
A survey on knowledge-enhanced multimodal learning0
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks0
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks0
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment0
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks0
Compound Tokens: Channel Fusion for Vision-Language Representation Learning0
Playing Lottery Tickets with Vision and Language0
Pre-training image-language transformers for open-vocabulary tasks0
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training0
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training0
Few-shot Multimodal Multitask Multilingual Learning0
Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing0
How Much Can CLIP Benefit Vision-and-Language Tasks?0
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning0
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation0
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.