| Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor | Dec 8, 2024 | MisconceptionsMultiple-choice | CodeCode Available | 0 | 5 |
| Compositional Image-Text Matching and Retrieval by Grounding Entities | May 4, 2025 | Image CaptioningImage-text matching | CodeCode Available | 0 | 5 |
| From Recognition to Cognition: Visual Commonsense Reasoning | Nov 27, 2018 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 0 | 5 |
| Connective Cognition Network for Directional Visual Commonsense Reasoning | Dec 1, 2019 | SentenceVisual Commonsense Reasoning | CodeCode Available | 0 | 5 |
| Heterogeneous Graph Learning for Visual Commonsense Reasoning | Oct 25, 2019 | Graph LearningVisual Commonsense Reasoning | CodeCode Available | 0 | 5 |
| Joint Answering and Explanation for Visual Commonsense Reasoning | Feb 25, 2022 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 | 5 |
| TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines | Oct 31, 2019 | AttributeQuestion Answering | CodeCode Available | 0 | 5 |
| TAB-VCR: Tags and Attributes based VCR Baselines | Dec 1, 2019 | AttributeQuestion Answering | CodeCode Available | 0 | 5 |
| Think Visually: Question Answering through Virtual Imagery | May 25, 2018 | Question AnsweringVisual Commonsense Reasoning | CodeCode Available | 0 | 5 |
| Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks | Apr 22, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 | 0 |
| A survey on knowledge-enhanced multimodal learning | Nov 19, 2022 | Conditional Image GenerationFactual Visual Question Answering | —Unverified | 0 | 0 |
| Attention Mechanism based Cognition-level Scene Understanding | Apr 17, 2022 | Question AnsweringScene Understanding | —Unverified | 0 | 0 |
| Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images | Mar 13, 2023 | Common Sense ReasoningExplanation Generation | —Unverified | 0 | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 | 0 |
| CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks | Jan 15, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 | 0 |
| Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? | Jun 11, 2024 | Adversarial TextImage Generation | —Unverified | 0 | 0 |
| Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning | May 26, 2023 | Object RecognitionVisual Commonsense Reasoning | —Unverified | 0 | 0 |
| Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR | May 27, 2024 | Question AnsweringTAG | —Unverified | 0 | 0 |
| Enforcing Reasoning in Visual Commonsense Reasoning | Oct 21, 2019 | Question AnsweringReinforcement Learning | —Unverified | 0 | 0 |
| EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning | Apr 22, 2024 | Visual Commonsense Reasoning | —Unverified | 0 | 0 |
| Generative Visual Commonsense Answering and Explaining with Generative Scene Graph Constructing | Jan 15, 2025 | Visual Commonsense Reasoning | —Unverified | 0 | 0 |
| GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions | May 24, 2023 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey | Dec 11, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Improving Vision-and-Language Reasoning via Spatial Relations Modeling | Nov 9, 2023 | Position regressionRelation | —Unverified | 0 | 0 |
| InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining | Mar 30, 2020 | Image RetrievalImage-text matching | —Unverified | 0 | 0 |