| Merlin:Empowering Multimodal LLMs with Foresight Minds | Nov 30, 2023 | Visual Question Answering | —Unverified | 0 |
| MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering | Nov 11, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| FunBench: Benchmarking Fundus Reading Skills of MLLMs | Mar 2, 2025 | AnatomyBenchmarking | —Unverified | 0 |
| AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering | Jul 28, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Mar 10, 2025 | MathQuestion Answering | —Unverified | 0 |
| From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason | Oct 1, 2019 | Graph Neural NetworkQuestion Answering | —Unverified | 0 |
| From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering | Jun 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark | Feb 24, 2025 | AllMultimodal Reasoning | —Unverified | 0 |
| Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry | Nov 17, 2024 | Question AnsweringScene Understanding | —Unverified | 0 |
| From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Nov 5, 2024 | Change DetectionContrastive Learning | —Unverified | 0 |
| From Pixels to Objects: Cubic Visual Attention for Visual Question Answering | Jun 4, 2022 | ObjectQuestion Answering | —Unverified | 0 |
| From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts | Nov 30, 2018 | Novel ConceptsQuestion Answering | —Unverified | 0 |
| From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities | Nov 1, 2023 | NavigateQuestion Answering | —Unverified | 0 |
| CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering | Jan 1, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models | Jan 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data | May 6, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration | Mar 17, 2025 | DenoisingQuestion Answering | —Unverified | 0 |
| COCO is "ALL'' You Need for Visual Instruction Fine-tuning | Jan 17, 2024 | AllImage Captioning | —Unverified | 0 |
| ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering | Nov 18, 2015 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Memory Augmented Neural Networks for Natural Language Processing | Sep 1, 2017 | AI AgentLanguage Modeling | —Unverified | 0 |
| freePruner: A Training-free Approach for Large Multimodal Model Acceleration | Nov 23, 2024 | QuantizationQuestion Answering | —Unverified | 0 |
| Free Form Medical Visual Question Answering in Radiology | Jan 23, 2024 | DiagnosticForm | —Unverified | 0 |
| A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation | Jun 12, 2023 | Image CaptioningMachine Translation | —Unverified | 0 |
| MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Apr 18, 2024 | Decision MakingMedical Visual Question Answering | —Unverified | 0 |
| Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Aug 23, 2024 | Instruction FollowingKnowledge Distillation | —Unverified | 0 |