| Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning | Jan 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning | Jan 1, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| VASR: Visual Analogies of Situation Recognition | Dec 8, 2022 | Common Sense ReasoningTriplet | CodeCode Available | 0 |
| A survey on knowledge-enhanced multimodal learning | Nov 19, 2022 | Conditional Image GenerationFactual Visual Question Answering | —Unverified | 0 |
| Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering | Sep 20, 2022 | Multimodal Deep LearningMultimodal Reasoning | CodeCode Available | 2 |
| ILLUME: Rationalizing Vision-Language Models through Human Interactions | Aug 17, 2022 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization | May 24, 2022 | DescriptiveImage Captioning | —Unverified | 0 |
| PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models | May 23, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks | Apr 25, 2022 | Few-Shot LearningIn-Context Learning | —Unverified | 0 |
| Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks | Apr 22, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 |