| Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Sep 11, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering? | Jun 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| An experimental study of the vision-bottleneck in VQA | Feb 14, 2022 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| Learning to Disambiguate by Asking Discriminative Questions | Aug 9, 2017 | BenchmarkingImage Captioning | —Unverified | 0 | 0 |
| Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering | Sep 14, 2022 | Adversarial RobustnessQuestion Answering | —Unverified | 0 | 0 |
| VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents | Apr 14, 2025 | Question AnsweringRAG | —Unverified | 0 | 0 |
| V-Doc : Visual questions answers with Documents | May 27, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 | 0 |
| V-Doc: Visual Questions Answers With Documents | Jan 1, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 | 0 |
| An Evaluation of GPT-4V and Gemini in Online VQA | Dec 17, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Neural Reasoning, Fast and Slow, for Video Question Answering | Jul 10, 2019 | Natural QuestionsQuestion Answering | —Unverified | 0 | 0 |
| Learning to Recognize the Unseen Visual Predicates | Sep 25, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning to Select Question-Relevant Relations for Visual Question Answering | Jun 1, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 | 0 |
| Learning to Specialize with Knowledge Distillation for Visual Question Answering | Dec 1, 2018 | General ClassificationGeneral Knowledge | —Unverified | 0 | 0 |
| Fine-tuning Large Language Models with Sequential Instructions | Mar 12, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning Visual Knowledge Memory Networks for Visual Question Answering | Jun 13, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| An Empirical Study on the Language Modal in Visual Question Answering | May 17, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision | Apr 20, 2020 | counterfactualimage-classification | —Unverified | 0 | 0 |
| Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models | Nov 23, 2023 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? | Mar 25, 2025 | Autonomous NavigationQuestion Answering | —Unverified | 0 | 0 |
| Fine-Grained Retrieval-Augmented Generation for Visual Question Answering | Feb 28, 2025 | Question AnsweringRAG | —Unverified | 0 | 0 |
| Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model | Jun 10, 2022 | Question AnsweringTask 2 | —Unverified | 0 | 0 |
| Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation | Jul 18, 2023 | Image GenerationQuestion Answering | —Unverified | 0 | 0 |
| Leveraging Medical Visual Question Answering with Supporting Facts | May 28, 2019 | DiversityMedical Visual Question Answering | —Unverified | 0 | 0 |
| Leveraging Visual Question Answering for Image-Caption Ranking | May 4, 2016 | Image RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Leveraging Visual Question Answering to Improve Text-to-Image Synthesis | Oct 28, 2020 | Auxiliary LearningImage Generation | —Unverified | 0 | 0 |
| Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models | May 30, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Lightweight In-Context Tuning for Multimodal Unified Models | Oct 8, 2023 | Image CaptioningIn-Context Learning | —Unverified | 0 | 0 |