| Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues | May 15, 2021 | Multimodal ReasoningNatural Language Inference | —Unverified | 0 |
| Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning | Jan 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning | Dec 16, 2021 | Visual Commonsense Reasoning | —Unverified | 0 |
| Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks | Apr 25, 2022 | Few-Shot LearningIn-Context Learning | —Unverified | 0 |
| To Root Artificial Intelligence Deeply in Basic Science for a New Generation of AI | Sep 11, 2020 | Brain Computer InterfaceDecision Making | —Unverified | 0 |
| Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training | Aug 16, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models | Oct 9, 2023 | Image CaptioningVisual Commonsense Reasoning | —Unverified | 0 |
| VisualCOMET: Reasoning about the Dynamic Context of a Still Image | Apr 22, 2020 | Visual Commonsense Reasoning | —Unverified | 0 |
| TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines | Oct 31, 2019 | AttributeQuestion Answering | CodeCode Available | 0 |