SOTAVerified

Visual Commonsense Reasoning

Papers

Showing 2650 of 65 papers

TitleStatusHype
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning0
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual PromptsCode0
Improving Vision-and-Language Reasoning via Spatial Relations Modeling0
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models0
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning0
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions0
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language0
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images0
Learning to Agree on Vision Attention for Visual Commonsense Reasoning0
Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning0
VASR: Visual Analogies of Situation RecognitionCode0
A survey on knowledge-enhanced multimodal learning0
ILLUME: Rationalizing Vision-Language Models through Human InteractionsCode0
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization0
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks0
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks0
Attention Mechanism based Cognition-level Scene Understanding0
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language TransformersCode0
Joint Answering and Explanation for Visual Commonsense ReasoningCode0
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks0
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound0
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning0
Interpretable Visual Understanding with Cognitive Attention NetworkCode0
Cognitive Visual Commonsense Reasoning Using Dynamic Working MemoryCode0
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.