| KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning | Dec 13, 2020 | SentenceVisual Commonsense Reasoning | —Unverified | 0 | 0 |
| Learning to Agree on Vision Attention for Visual Commonsense Reasoning | Feb 4, 2023 | Visual Commonsense ReasoningVisual Reasoning | —Unverified | 0 | 0 |
| MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound | Jan 7, 2022 | Action ClassificationNavigate | —Unverified | 0 | 0 |
| ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition | Jun 9, 2024 | Action RecognitionObject Recognition | —Unverified | 0 | 0 |
| On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization | May 24, 2022 | DescriptiveImage Captioning | —Unverified | 0 | 0 |
| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues | May 15, 2021 | Multimodal ReasoningNatural Language Inference | —Unverified | 0 | 0 |
| Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning | Jan 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning | Dec 16, 2021 | Visual Commonsense Reasoning | —Unverified | 0 | 0 |
| Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks | Apr 25, 2022 | Few-Shot LearningIn-Context Learning | —Unverified | 0 | 0 |
| To Root Artificial Intelligence Deeply in Basic Science for a New Generation of AI | Sep 11, 2020 | Brain Computer InterfaceDecision Making | —Unverified | 0 | 0 |
| Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training | Aug 16, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models | Oct 9, 2023 | Image CaptioningVisual Commonsense Reasoning | —Unverified | 0 | 0 |
| VisualCOMET: Reasoning about the Dynamic Context of a Still Image | Apr 22, 2020 | Visual Commonsense Reasoning | —Unverified | 0 | 0 |