| LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Apr 15, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Improving Automatic VQA Evaluation Using Large Language Models | Oct 4, 2023 | In-Context LearningQuestion Answering | —Unverified | 0 |
| Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning | Apr 15, 2022 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning | Jan 28, 2024 | Data AugmentationQuestion Answering | —Unverified | 0 |
| H2OVL-Mississippi Vision Language Models Technical Report | Oct 17, 2024 | Document AIVisual Question Answering | —Unverified | 0 |
| Improving Multi-modal Large Language Model through Boosting Vision Capabilities | Oct 17, 2024 | DecoderLanguage Modeling | —Unverified | 0 |
| CPL: Counterfactual Prompt Learning for Vision and Language Models | Oct 19, 2022 | counterfactualimage-classification | —Unverified | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 |
| CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Improving Visual Question Answering by Referring to Generated Paragraph Captions | Jun 14, 2019 | DecoderImage Captioning | —Unverified | 0 |
| Auto-Parsing Network for Image Captioning and Visual Question Answering | Aug 24, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Improving VQA and its Explanations \\ by Comparing Competing Explanations | Jun 28, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Grounding Task Assistance with Multimodal Cues from a Single Demonstration | May 2, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Apr 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering | Jan 1, 2023 | Continual LearningLanguage Modelling | —Unverified | 0 |
| Grounding Complex Navigational Instructions Using Scene Graphs | Jun 3, 2021 | Question Answeringreinforcement-learning | —Unverified | 0 |
| In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering | Aug 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding | Mar 3, 2024 | Visual Question Answering | —Unverified | 0 |
| Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports | May 22, 2025 | Answer GenerationQuestion Answering | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Grounding Answers for Visual Questions Asked by Visually Impaired People | Jun 20, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | Nov 6, 2023 | CoLAQuestion Answering | —Unverified | 0 |
| Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning | May 19, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space | Apr 2, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs | May 3, 2025 | ChunkingQuestion Answering | —Unverified | 0 |