| Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation | Mar 5, 2024 | Data AugmentationMedical Visual Question Answering | —Unverified | 0 | 0 |
| ViLMedic: a framework for research at the intersection of vision and language in medical AI | May 1, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Enhancing Explainability in Multimodal Large Language Models Using Ontological Context | Sep 27, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents | Oct 25, 2023 | AllDocument Classification | —Unverified | 0 | 0 |
| MIMOQA: Multimodal Input Multimodal Output Question Answering | Jun 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis | Jul 3, 2024 | PositionQuestion Answering | —Unverified | 0 | 0 |
| Mindstorms in Natural Language-Based Societies of Mind | May 26, 2023 | 3D GenerationImage Captioning | —Unverified | 0 | 0 |
| Enhancing BERT-Based Visual Question Answering through Keyword-Driven Sentence Selection | Oct 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach | May 1, 2024 | Computational EfficiencyQuestion Answering | —Unverified | 0 | 0 |
| Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering | Dec 30, 2024 | Image CaptioningObject Recognition | —Unverified | 0 | 0 |
| Enforcing Reasoning in Visual Commonsense Reasoning | Oct 21, 2019 | Question AnsweringReinforcement Learning | —Unverified | 0 | 0 |
| End-to-End Vision Tokenizer Tuning | May 15, 2025 | Image GenerationQuestion Answering | —Unverified | 0 | 0 |
| Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories | Jun 15, 2023 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation | Mar 12, 2022 | Image CaptioningKnowledge Distillation | —Unverified | 0 | 0 |
| Where is this coming from? Making groundedness count in the evaluation of Document VQA models | Mar 24, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation | Nov 16, 2021 | Image CaptioningKnowledge Distillation | —Unverified | 0 | 0 |
| Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding | Sep 10, 2024 | HallucinationImage Captioning | —Unverified | 0 | 0 |
| EmoAssist: Emotional Assistant for Visual Impairment Community | Feb 13, 2025 | Emotional IntelligenceQuestion Answering | —Unverified | 0 | 0 |
| Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy | Mar 26, 2025 | HallucinationImage Captioning | —Unverified | 0 | 0 |
| Data-augmented phrase-level alignment for mitigating object hallucination | May 28, 2024 | Data AugmentationHallucination | —Unverified | 0 | 0 |
| Mitigating the Impact of Attribute Editing on Face Recognition | Mar 12, 2024 | AttributeFace Recognition | —Unverified | 0 | 0 |
| MIVC: Multiple Instance Visual Component for Visual-Language Models | Dec 28, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision | Oct 10, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Embodied Scene Understanding for Vision Language Models via MetaVQA | Jan 15, 2025 | Decision MakingQuestion Answering | —Unverified | 0 | 0 |
| Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering | Jun 3, 2024 | DiversityQuestion Answering | —Unverified | 0 | 0 |