SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 2650 of 56 papers

TitleStatusHype
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
Prompt Tuning for Generative Multimodal Pretrained ModelsCode0
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language ExplanationsCode0
MixGen: A New Multi-Modal Data AugmentationCode1
CoCa: Contrastive Captioners are Image-Text Foundation ModelsCode1
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering0
Visual Spatial ReasoningCode1
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks0
Fine-Grained Visual EntailmentCode1
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment0
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language TasksCode1
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment0
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning FrameworkCode0
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks0
Logically at Factify 2022: Multimodal Fact Verification0
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation0
How Much Can CLIP Benefit Vision-and-Language Tasks?0
Check It Again:Progressive Visual Question Answering via Visual EntailmentCode1
How Much Can CLIP Benefit Vision-and-Language Tasks?Code1
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training0
Check It Again: Progressive Visual Question Answering via Visual EntailmentCode1
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training0
Playing Lottery Tickets with Vision and Language0
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation LearningCode1
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.