SOTAVerified

Visual Entailment

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Papers

Showing 2650 of 56 papers

TitleStatusHype
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation0
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning0
Lightweight In-Context Tuning for Multimodal Unified Models0
UNITER: Learning UNiversal Image-TExt Representations0
Logically at Factify 2022: Multimodal Fact Verification0
Playing Lottery Tickets with Vision and Language0
Pre-training image-language transformers for open-vocabulary tasks0
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training0
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training0
Few-shot Multimodal Multitask Multilingual Learning0
Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing0
How Much Can CLIP Benefit Vision-and-Language Tasks?0
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning0
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment0
AlignVE: Visual Entailment Recognition Based on Alignment Relations0
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering0
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks0
A survey on knowledge-enhanced multimodal learning0
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks0
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks0
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment0
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks0
Compound Tokens: Channel Fusion for Vision-Language Representation Learning0
Visual Entailment Task for Visually-Grounded Language LearningCode0
Stop Pre-Training: Adapt Visual-Language Models to Unseen LanguagesCode0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.