SOTAVerified

Visual Commonsense Reasoning

Papers

Showing 125 of 65 papers

TitleStatusHype
Compositional Image-Text Matching and Retrieval by Grounding EntitiesCode0
Generative Visual Commonsense Answering and Explaining with Generative Scene Graph Constructing0
How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
Improving Visual Commonsense in Language Models via Multiple Image GenerationCode1
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?0
ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition0
Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language ModelsCode2
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR0
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning0
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual PromptsCode0
Improving Vision-and-Language Reasoning via Spatial Relations Modeling0
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models0
A Survey on Interpretable Cross-modal ReasoningCode1
GPT4RoI: Instruction Tuning Large Language Model on Region-of-InterestCode2
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning0
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions0
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language0
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images0
Learning to Agree on Vision Attention for Visual Commonsense Reasoning0
Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning0
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement LearningCode1
VASR: Visual Analogies of Situation RecognitionCode0
A survey on knowledge-enhanced multimodal learning0
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question AnsweringCode2
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.