| Interpretable Visual Question Answering via Reasoning Supervision | Sep 7, 2023 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision | Aug 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback | Mar 19, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Graph-Structured Representations for Visual Question Answering | Sep 19, 2016 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool | Mar 16, 2018 | Question AnsweringReinforcement Learning | —Unverified | 0 |
| Inverse Visual Question Answering with Multi-Level Attentions | Sep 17, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture | Nov 11, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Bilinear Graph Networks for Visual Question Answering | Jul 23, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Analysis of Visual Question Answering Algorithms with attention model | May 4, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Graph Neural Networks in Vision-Language Image Understanding: A Survey | Mar 7, 2023 | Image CaptioningImage Retrieval | —Unverified | 0 |
| A Unified Framework for Multilingual and Code-Mixed Visual Question Answering | Dec 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Oct 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Nov 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Is GPT-3 all you need for Visual Question Answering in Cultural Heritage? | Jul 25, 2022 | AllQuestion Answering | —Unverified | 0 |
| Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network | Sep 30, 2020 | Heuristic SearchQuestion Answering | —Unverified | 0 |
| LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing | Jun 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| It Takes Two to Tango: Towards Theory of AI's Mind | Apr 3, 2017 | AttributeQuestion Answering | —Unverified | 0 |
| iVQA: Inverse Visual Question Answering | Oct 10, 2017 | Question AnsweringQuestion Generation | —Unverified | 0 |
| GRAM: Global Reasoning for Multi-Page VQA | Jan 7, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| GRADE: Quantifying Sample Diversity in Text-to-Image Models | Oct 29, 2024 | AttributeDiversity | —Unverified | 0 |
| LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation | Jul 9, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Linguistically Driven Graph Capsule Network for Visual Question Reasoning | Mar 23, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference | Nov 15, 2024 | QuantizationQuestion Answering | —Unverified | 0 |
| GPT-4V Explorations: Mining Autonomous Driving | Jun 24, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning | Jun 8, 2025 | Medical Report GenerationQuestion Answering | —Unverified | 0 |
| Joint learning of object graph and relation graph for visual question answering | May 9, 2022 | AttributeGraph Neural Network | —Unverified | 0 |
| Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering | Jan 1, 2021 | Novel ConceptsQuestion Answering | —Unverified | 0 |
| Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention | Apr 14, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| 利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering) | Aug 1, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems | Jan 1, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning | Oct 21, 2019 | Data AugmentationDecision Making | —Unverified | 0 |
| Lightweight In-Context Tuning for Multimodal Unified Models | Oct 8, 2023 | Image CaptioningIn-Context Learning | —Unverified | 0 |
| `Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks | Apr 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration | Jan 7, 2025 | Anomaly DetectionAnomaly Segmentation | —Unverified | 0 |
| Goal-Oriented Semantic Communication for Wireless Visual Question Answering | Nov 3, 2024 | Edge-computingQuestion Answering | —Unverified | 0 |
| Kernel Pooling for Convolutional Neural Networks | Jul 1, 2017 | Face RecognitionFine-Grained Visual Categorization | —Unverified | 0 |
| γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models | Oct 17, 2024 | Visual Question Answering | —Unverified | 0 |
| Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models | Mar 26, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| A Multimodal Social Agent | Dec 11, 2024 | Common Sense ReasoningDecision Making | —Unverified | 0 |
| Knowing Where to Look? Analysis on Attention of Visual Question Answering System | Oct 9, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Knowledge Acquisition for Visual Question Answering via Iterative Querying | Jul 1, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings | May 3, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain? | Dec 27, 2021 | ArticlesMedical Visual Question Answering | —Unverified | 0 |
| Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering | Apr 16, 2024 | Language ModellingPrediction | —Unverified | 0 |
| GiVE: Guiding Visual Encoder to Perceive Overlooked Information | Oct 26, 2024 | ObjectQuestion Answering | —Unverified | 0 |
| Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering | Jun 8, 2023 | Question AnsweringRetrieval | —Unverified | 0 |
| Connecting Language and Vision to Actions | Jul 1, 2018 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Attentive Explanations: Justifying Decisions and Pointing to the Evidence | Dec 14, 2016 | Decision MakingQuestion Answering | —Unverified | 0 |
| GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Mar 16, 2025 | Change DetectionImage Captioning | —Unverified | 0 |
| GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing | Jan 12, 2025 | Image CaptioningLanguage Modeling | —Unverified | 0 |