| Estimating semantic structure for the VQA answer space | Jun 10, 2020 | General ClassificationQuestion Answering | —Unverified | 0 | 0 |
| ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation | Nov 9, 2022 | Contrastive LearningDecoder | —Unverified | 0 | 0 |
| An Analysis of Visual Question Answering Algorithms | Mar 28, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Medical Visual Question Answering: A Survey | Nov 19, 2021 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Medical visual question answering using joint self-supervised learning | Feb 25, 2023 | DecoderDiversity | —Unverified | 0 | 0 |
| ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers | Dec 27, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering | Oct 18, 2022 | Passage RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion | Aug 14, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility | May 30, 2025 | Decision MakingMedical Diagnosis | —Unverified | 0 | 0 |
| Analysis on Image Set Visual Question Answering | Mar 31, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling | Jul 8, 2025 | ArticlesMultimodal Reasoning | —Unverified | 0 | 0 |
| MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Apr 18, 2024 | Decision MakingMedical Visual Question Answering | —Unverified | 0 | 0 |
| MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning | Feb 26, 2025 | Domain GeneralizationMedical Image Analysis | —Unverified | 0 | 0 |
| MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation | Dec 4, 2023 | Instruction FollowingLanguage Modeling | —Unverified | 0 | 0 |
| MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering | Jun 18, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation | Mar 6, 2025 | Active LearningImage Segmentation | —Unverified | 0 | 0 |
| Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry | Nov 17, 2024 | Question AnsweringScene Understanding | —Unverified | 0 | 0 |
| Memory Augmented Neural Networks for Natural Language Processing | Sep 1, 2017 | AI AgentLanguage Modeling | —Unverified | 0 | 0 |
| Merlin:Empowering Multimodal LLMs with Foresight Minds | Nov 30, 2023 | Visual Question Answering | —Unverified | 0 | 0 |
| Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering | Jun 7, 2025 | In-Context LearningMeta-Learning | —Unverified | 0 | 0 |
| MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification | May 29, 2024 | HallucinationImage Captioning | —Unverified | 0 | 0 |
| From Training-Free to Adaptive: Empirical Insights into MLLMs' Understanding of Detection Information | Jan 31, 2024 | Hallucinationobject-detection | —Unverified | 0 | 0 |
| MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering | Nov 11, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns | Apr 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MGA-VQA: Multi-Granularity Alignment for Visual Question Answering | Jan 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |