SOTAVerified

Visual Commonsense Reasoning

Papers

Showing 2650 of 65 papers

TitleStatusHype
ILLUME: Rationalizing Vision-Language Models through Human InteractionsCode0
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization0
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsCode1
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks0
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks0
Attention Mechanism based Cognition-level Scene Understanding0
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language TransformersCode0
All in One: Exploring Unified Video-Language Pre-trainingCode2
Joint Answering and Explanation for Visual Commonsense ReasoningCode0
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks0
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound0
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning0
Towards artificial general intelligence via a multimodal foundation modelCode1
Broaden the Vision: Geo-Diverse Visual Commonsense ReasoningCode1
X-modaler: A Versatile and High-performance Codebase for Cross-modal AnalyticsCode1
Interpretable Visual Understanding with Cognitive Attention NetworkCode0
Cognitive Visual Commonsense Reasoning Using Dynamic Working MemoryCode0
MERLOT: Multimodal Neural Script Knowledge ModelsCode1
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues0
Playing Lottery Tickets with Vision and Language0
Unifying Vision-and-Language Tasks via Text GenerationCode1
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning0
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense GraphsCode1
To Root Artificial Intelligence Deeply in Basic Science for a New Generation of AI0
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.