LOIS: Looking Out of Instance Semantics for Visual Question Answering Jul 26, 2023 Question Answering Visual Question Answering
— Unverified 00 Long-Form Answers to Visual Questions from Blind and Low Vision People Aug 12, 2024 Form Visual Question Answering (VQA)
— Unverified 00 Look, Learn and Leverage (L^3): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment Aug 30, 2024 Question Answering Representation Learning
— Unverified 00 Look, Read and Ask: Learning to Ask Questions by Reading Text in Images Nov 23, 2022 Optical Character Recognition (OCR) Question Answering
— Unverified 00 Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts Jun 24, 2024 Mathematical Reasoning Visual Question Answering (VQA)
— Unverified 00 Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey Dec 18, 2021 Data Augmentation Few-Shot Learning
— Unverified 00 LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering Aug 1, 2021 Question Answering Visual Question Answering
— Unverified 00 Can You Explain That? Lucid Explanations Help Human-AI Collaborative Image Retrieval Apr 5, 2019 Image Retrieval Question Answering
— Unverified 00 Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects Dec 8, 2023 Image Captioning object-detection
— Unverified 00 M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation Aug 29, 2024 Instruction Following Medical Report Generation
— Unverified 00 MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering Mar 24, 2025 Graph Neural Network Question Answering
— Unverified 00 Making the V in Text-VQA Matter Aug 1, 2023 Optical Character Recognition (OCR) TextVQA
— Unverified 00 Making Video Quality Assessment Models Sensitive to Frame Rate Distortions May 21, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 Making Video Quality Assessment Models Robust to Bit Depth Apr 25, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning Oct 9, 2022 Image-text Retrieval multimodal interaction
— Unverified 00 MANGO: Enhancing the Robustness of VQA Models via Adversarial Noise Generation Jan 16, 2022 Logical Reasoning Question Answering
— Unverified 00 Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems Jan 1, 2024 Question Answering Visual Question Answering
— Unverified 00 MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering Dec 19, 2022 Chart Question Answering Data Summarization
— Unverified 00 Measuring CLEVRness: Black-box Testing of Visual Reasoning Models Sep 29, 2021 Benchmarking Diagnostic
— Unverified 00 Measuring CLEVRness: Blackbox testing of Visual Reasoning Models Feb 24, 2022 Benchmarking Diagnostic
— Unverified 00 Measuring Machine Intelligence Through Visual Question Answering Aug 31, 2016 Image Captioning Question Answering
— Unverified 00 Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model Nov 19, 2024 Language Modeling Language Modelling
— Unverified 00 MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning May 22, 2025 Diagnostic Visual Question Answering (VQA)
— Unverified 00 Medical Visual Question Answering: A Survey Nov 19, 2021 Medical Visual Question Answering Question Answering
— Unverified 00 Medical visual question answering using joint self-supervised learning Feb 25, 2023 Decoder Diversity
— Unverified 00 MedSG-Bench: A Benchmark for Medical Image Sequences Grounding May 17, 2025 Visual Grounding Visual Question Answering (VQA)
— Unverified 00 MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale Apr 18, 2024 Decision Making Medical Visual Question Answering
— Unverified 00 MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation Dec 4, 2023 Instruction Following Language Modeling
— Unverified 00 MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering Jun 18, 2025 Multimodal Reasoning Question Answering
— Unverified 00 Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry Nov 17, 2024 Question Answering Scene Understanding
— Unverified 00 Memory Augmented Neural Networks for Natural Language Processing Sep 1, 2017 AI Agent Language Modeling
— Unverified 00 MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models Jun 2, 2023 In-Context Learning Language Modeling
— Unverified 00 MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering Nov 11, 2022 Medical Visual Question Answering Question Answering
— Unverified 00 MGA-VQA: Multi-Granularity Alignment for Visual Question Answering Jan 25, 2022 Question Answering Visual Question Answering
— Unverified 00 MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM Jul 16, 2025 Attribute Face Swapping
— Unverified 00 MIMOQA: Multimodal Input Multimodal Output Question Answering Jun 1, 2021 Question Answering Visual Question Answering
— Unverified 00 MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis Jul 3, 2024 Position Question Answering
— Unverified 00 MiniVQA - A resource to build your tailored VQA competition Jun 1, 2021 BIG-bench Machine Learning Visual Question Answering (VQA)
— Unverified 00 Misleading Failures of Partial-input Baselines May 14, 2019 Natural Language Inference Visual Question Answering (VQA)
— Unverified 00 Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning Mar 15, 2024 Hallucination Instruction Following
— Unverified 00 Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering Jun 3, 2024 Diversity Question Answering
— Unverified 00 MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning May 31, 2025 Diagnostic Reinforcement Learning (RL)
— Unverified 00 MMED: A Multi-domain and Multi-modality Event Dataset Apr 4, 2019 Articles Question Answering
— Unverified 00 MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Nov 5, 2024 MME Question Answering
— Unverified 00 MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models May 26, 2025 Image Generation Visual Question Answering (VQA)
— Unverified 00 MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants Oct 13, 2021 intent-classification Intent Classification
— Unverified 00 MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework Apr 14, 2025 Question Answering RAG
— Unverified 00 MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence May 29, 2025 Multiple-choice Spatial Reasoning
— Unverified 00 MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs Jun 24, 2024 Question Answering Visual Question Answering
— Unverified 00 MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering Dec 6, 2021 Language Modelling Question Answering
— Unverified 00