SOTAVerified

Visual Commonsense Reasoning

Papers

Showing 1120 of 65 papers

TitleStatusHype
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual PromptsCode0
Improving Vision-and-Language Reasoning via Spatial Relations Modeling0
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models0
A Survey on Interpretable Cross-modal ReasoningCode1
GPT4RoI: Instruction Tuning Large Language Model on Region-of-InterestCode2
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning0
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions0
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language0
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images0
Learning to Agree on Vision Attention for Visual Commonsense Reasoning0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.