SOTAVerified

Visual Commonsense Reasoning

Papers

Showing 2650 of 65 papers

TitleStatusHype
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning0
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR0
Enforcing Reasoning in Visual Commonsense Reasoning0
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning0
Generative Visual Commonsense Answering and Explaining with Generative Scene Graph Constructing0
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions0
How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey0
Improving Vision-and-Language Reasoning via Spatial Relations Modeling0
InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining0
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning0
Learning to Agree on Vision Attention for Visual Commonsense Reasoning0
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound0
ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition0
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization0
Playing Lottery Tickets with Vision and Language0
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues0
Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning0
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning0
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks0
To Root Artificial Intelligence Deeply in Basic Science for a New Generation of AI0
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training0
UNITER: Learning UNiversal Image-TExt Representations0
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models0
VisualCOMET: Reasoning about the Dynamic Context of a Still Image0
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning BaselinesCode0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.