| What Large Language Models Bring to Text-rich VQA? | Nov 13, 2023 | Image ComprehensionOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Improving Users' Mental Model with Attention-directed Counterfactual Edits | Oct 13, 2021 | counterfactualQuestion Answering | —Unverified | 0 | 0 |
| Improving Visual Question Answering by Referring to Generated Paragraph Captions | Jun 14, 2019 | DecoderImage Captioning | —Unverified | 0 | 0 |
| Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions | Apr 6, 2023 | In-Context LearningQuestion Answering | —Unverified | 0 | 0 |
| Improving VQA and its Explanations \\ by Comparing Competing Explanations | Jun 28, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions | Jun 8, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks | Dec 3, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| A Restricted Visual Turing Test for Deep Scene and Event Understanding | Dec 6, 2015 | Question AnsweringVideo Captioning | —Unverified | 0 | 0 |
| Generic Attention-model Explainability by Weighted Relevance Accumulation | Aug 20, 2023 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering | Aug 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding | Mar 3, 2024 | Visual Question Answering | —Unverified | 0 | 0 |
| Generative Visual Question Answering | Jul 18, 2023 | Generative Visual Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Generating Triples with Adversarial Networks for Scene Graph Construction | Feb 7, 2018 | Attributegraph construction | —Unverified | 0 | 0 |
| Generating Rationales in Visual Question Answering | Apr 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| InfographicVQA | Apr 26, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning | May 19, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space | Apr 2, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Generating Natural Questions from Images for Multimodal Assistants | Nov 17, 2020 | Common Sense ReasoningNatural Questions | —Unverified | 0 | 0 |
| Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention | Feb 15, 2019 | Explanation GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Instruction-augmented Multimodal Alignment for Image-Text and Element Matching | Apr 16, 2025 | Image AugmentationImage Generation | —Unverified | 0 | 0 |
| Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge | May 30, 2023 | Answer SelectionQuestion Answering | —Unverified | 0 | 0 |
| Generalized Hadamard-Product Fusion Operators for Visual Question Answering | Mar 26, 2018 | Neural Architecture SearchQuestion Answering | —Unverified | 0 | 0 |
| Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training | Nov 20, 2024 | Contrastive Learningimage-classification | —Unverified | 0 | 0 |
| Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs | Mar 26, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 | 0 |
| Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models | Mar 8, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 | 0 |
| A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading | Jul 19, 2023 | Medical Image AnalysisQuestion Answering | —Unverified | 0 | 0 |
| Integrating Knowledge and Reasoning in Image Understanding | Jun 24, 2019 | Object RecognitionQuestion Answering | —Unverified | 0 | 0 |
| Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent | Nov 8, 2024 | Autonomous DrivingLanguage Modeling | —Unverified | 0 | 0 |
| Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety | Jan 4, 2022 | DecoderDeep Learning | —Unverified | 0 | 0 |
| Interactive Visual Task Learning for Robots | Dec 20, 2023 | Continual LearningNovel Concepts | —Unverified | 0 | 0 |
| Can Generative AI Support Patients' & Caregivers' Informational Needs? Towards Task-Centric Evaluation Of AI Systems | Jan 31, 2024 | Computed Tomography (CT)Diagnostic | —Unverified | 0 | 0 |
| InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | Jul 3, 2024 | ArticlesImage Comprehension | —Unverified | 0 | 0 |
| InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model | Jan 29, 2024 | FormLanguage Modeling | —Unverified | 0 | 0 |
| Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems | Oct 26, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks | Oct 24, 2024 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering | May 24, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Interpretable Counting for Visual Question Answering | Dec 23, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models | Jan 3, 2025 | Binary ClassificationFace Anti-Spoofing | —Unverified | 0 | 0 |
| Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning | Feb 19, 2023 | Graph LearningMedical Visual Question Answering | —Unverified | 0 | 0 |
| Interpretable Neural Computation for Real-World Compositional Visual Question Answering | Oct 10, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Interpretable Visual Question Answering Referring to Outside Knowledge | Mar 8, 2023 | DiversityImage Captioning | —Unverified | 0 | 0 |
| Interpretable Visual Question Answering by Reasoning on Dependency Trees | Sep 6, 2018 | Question Answeringvalid | —Unverified | 0 | 0 |
| Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining | Aug 1, 2018 | Question AnsweringVisual Grounding | —Unverified | 0 | 0 |
| Interpretable Visual Question Answering via Reasoning Supervision | Sep 7, 2023 | Common Sense ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision | Aug 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Gender and Racial Bias in Visual Question Answering Datasets | May 17, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks | Feb 27, 2024 | Domain GeneralizationImage Captioning | —Unverified | 0 | 0 |
| Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool | Mar 16, 2018 | Question AnsweringReinforcement Learning | —Unverified | 0 | 0 |
| Inverse Visual Question Answering with Multi-Level Attentions | Sep 17, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Investigating Biases in Textual Entailment Datasets | Jun 23, 2019 | BIG-bench Machine LearningNatural Language Inference | —Unverified | 0 | 0 |