| Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space | Apr 2, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Instruction-augmented Multimodal Alignment for Image-Text and Element Matching | Apr 16, 2025 | Image AugmentationImage Generation | —Unverified | 0 |
| Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs | Mar 26, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models | Mar 8, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 |
| Integrating Knowledge and Reasoning in Image Understanding | Jun 24, 2019 | Object RecognitionQuestion Answering | —Unverified | 0 |
| Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent | Nov 8, 2024 | Autonomous DrivingLanguage Modeling | —Unverified | 0 |
| Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety | Jan 4, 2022 | DecoderDeep Learning | —Unverified | 0 |
| Interactive Visual Task Learning for Robots | Dec 20, 2023 | Continual LearningNovel Concepts | —Unverified | 0 |
| InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | Jul 3, 2024 | ArticlesImage Comprehension | —Unverified | 0 |
| InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model | Jan 29, 2024 | FormLanguage Modeling | —Unverified | 0 |
| Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks | Oct 24, 2024 | image-classificationImage Classification | —Unverified | 0 |
| Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering | May 24, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Interpretable Counting for Visual Question Answering | Dec 23, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models | Jan 3, 2025 | Binary ClassificationFace Anti-Spoofing | —Unverified | 0 |
| Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning | Feb 19, 2023 | Graph LearningMedical Visual Question Answering | —Unverified | 0 |
| Interpretable Neural Computation for Real-World Compositional Visual Question Answering | Oct 10, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Interpretable Visual Question Answering Referring to Outside Knowledge | Mar 8, 2023 | DiversityImage Captioning | —Unverified | 0 |
| Interpretable Visual Question Answering by Reasoning on Dependency Trees | Sep 6, 2018 | Question Answeringvalid | —Unverified | 0 |
| Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining | Aug 1, 2018 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Interpretable Visual Question Answering via Reasoning Supervision | Sep 7, 2023 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision | Aug 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool | Mar 16, 2018 | Question AnsweringReinforcement Learning | —Unverified | 0 |
| Inverse Visual Question Answering with Multi-Level Attentions | Sep 17, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Investigating Biases in Textual Entailment Datasets | Jun 23, 2019 | BIG-bench Machine LearningNatural Language Inference | —Unverified | 0 |
| ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Oct 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Nov 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Is GPT-3 all you need for Visual Question Answering in Cultural Heritage? | Jul 25, 2022 | AllQuestion Answering | —Unverified | 0 |
| Iterated learning for emergent systematicity in VQA | May 3, 2021 | Question AnsweringSystematic Generalization | —Unverified | 0 |
| It Takes Two to Tango: Towards Theory of AI's Mind | Apr 3, 2017 | AttributeQuestion Answering | —Unverified | 0 |
| iVQA: Inverse Visual Question Answering | Oct 10, 2017 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Jaeger: A Concatenation-Based Multi-Transformer VQA Model | Oct 11, 2023 | Dimensionality Reductionmodel | —Unverified | 0 |
| JEEM: Vision-Language Understanding in Four Arabic Dialects | Mar 27, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Joint learning of object graph and relation graph for visual question answering | May 9, 2022 | AttributeGraph Neural Network | —Unverified | 0 |
| Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention | Apr 14, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems | Jan 1, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| `Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks | Apr 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration | Jan 7, 2025 | Anomaly DetectionAnomaly Segmentation | —Unverified | 0 |
| Kernel Pooling for Convolutional Neural Networks | Jul 1, 2017 | Face RecognitionFine-Grained Visual Categorization | —Unverified | 0 |
| Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models | Mar 26, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Knowing Where to Look? Analysis on Attention of Visual Question Answering System | Oct 9, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Knowledge Acquisition for Visual Question Answering via Iterative Querying | Jul 1, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings | May 3, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Knowledge-Based Counterfactual Queries for Visual Question Answering | Mar 5, 2023 | counterfactualDecision Making | —Unverified | 0 |
| Knowledge-Based Visual Question Answering in Videos | Apr 17, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Knowledge Condensation and Reasoning for Knowledge-based VQA | Mar 15, 2024 | Question AnsweringReading Comprehension | —Unverified | 0 |
| Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering | Jun 8, 2023 | Question AnsweringRetrieval | —Unverified | 0 |
| KOSMOS-2.5: A Multimodal Literate Model | Sep 20, 2023 | document understandingmodel | —Unverified | 0 |
| KVQA: Knowledge-Aware Visual Question Answering | Jul 17, 2019 | Knowledge GraphsQuestion Answering | —Unverified | 0 |
| Language bias in Visual Question Answering: A Survey and Taxonomy | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Language Features Matter: Effective Language Representations for Vision-Language Tasks | Aug 17, 2019 | Image CaptioningLanguage Modelling | —Unverified | 0 |