| Adapting Lightweight Vision Language Models for Radiological Visual Question Answering | Jun 17, 2025 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering | Apr 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering | Mar 24, 2022 | GPUQuestion Answering | CodeCode Available | 0 |
| BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection | Jan 31, 2019 | Question AnsweringRelationship Detection | CodeCode Available | 0 |
| Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering | Dec 1, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study | Mar 21, 2025 | AttributeMathematical Problem-Solving | CodeCode Available | 0 |
| Resource-efficient Inference with Foundation Model Programs | Apr 9, 2025 | modelQuestion Answering | CodeCode Available | 0 |
| Is Multimodal Vision Supervision Beneficial to Language? | Feb 10, 2023 | Image RetrievalNatural Language Understanding | CodeCode Available | 0 |
| DocMIA: Document-Level Membership Inference Attacks against DocVQA Models | Feb 6, 2025 | document understandingInference Attack | CodeCode Available | 0 |
| DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness | Nov 29, 2024 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 0 |
| Zero-shot Visual Question Answering with Language Model Feedback | May 26, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering | Nov 17, 2015 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| IQ-VQA: Intelligent Visual Question Answering | Jul 8, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering | Dec 11, 2024 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | CodeCode Available | 0 |
| A simple neural network module for relational reasoning | Jun 5, 2017 | Image Retrieval with Multi-Modal QueryQuestion Answering | CodeCode Available | 0 |
| Towards Knowledge-Augmented Visual Question Answering | Dec 1, 2020 | General KnowledgeGraph Attention | CodeCode Available | 0 |
| Towards Language-guided Visual Recognition via Dynamic Convolutions | Oct 17, 2021 | Question AnsweringReferring Expression | CodeCode Available | 0 |
| REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory | Dec 10, 2022 | Image CaptioningLanguage Modeling | CodeCode Available | 0 |
| IQA: Visual Question Answering in Interactive Environments | Dec 9, 2017 | NavigateReinforcement Learning | CodeCode Available | 0 |
| Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models | Jun 28, 2025 | image-classificationImage Classification | CodeCode Available | 0 |
| Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering | Sep 13, 2021 | Data AugmentationQuestion Answering | CodeCode Available | 0 |
| Revisiting Visual Question Answering Baselines | Jun 27, 2016 | Binary ClassificationMultiple-choice | CodeCode Available | 0 |
| iParaphrasing: Extracting Visually Grounded Paraphrases via an Image | Jun 12, 2018 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA | Mar 4, 2025 | Medical DiagnosisQuestion Answering | CodeCode Available | 0 |
| REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering | Jul 27, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Towards Multilingual Audio-Visual Question Answering | Jun 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| Right this way: Can VLMs Guide Us to See More to Answer Questions? | Nov 1, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering | Mar 26, 2024 | Decision MakingExplainable artificial intelligence | CodeCode Available | 0 |
| Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following | Jun 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD | Apr 9, 2024 | 4kLanguage Modeling | CodeCode Available | 0 |
| An Entropy Clustering Approach for Assessing Visual Question Difficulty | Apr 12, 2020 | ClusteringQuestion Answering | CodeCode Available | 0 |
| Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset | Nov 21, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Visual Coreference Resolution in Visual Dialog using Neural Module Networks | Sep 6, 2018 | Common Sense Reasoningcoreference-resolution | CodeCode Available | 0 |
| BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models | Jan 28, 2023 | Out-of-Distribution GeneralizationQuestion Answering | CodeCode Available | 0 |
| A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models | Aug 2, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Robust Explanations for Visual Question Answering | Jan 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering | Mar 22, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| A Simple Baseline for Knowledge-Based Visual Question Answering | Oct 20, 2023 | In-Context LearningQuestion Answering | CodeCode Available | 0 |
| Differential Attention for Visual Question Answering | Apr 1, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering | Dec 19, 2023 | Image RetrievalQuestion Answering | CodeCode Available | 0 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 |
| Instruction Makes a Difference | Feb 1, 2024 | HallucinationInstruction Following | CodeCode Available | 0 |
| Routing Networks and the Challenges of Modular and Compositional Computation | Apr 29, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering | Oct 19, 2023 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs | Oct 26, 2023 | AttributeMachine Translation | CodeCode Available | 0 |
| Did the Model Understand the Question? | May 14, 2018 | modelQuestion Answering | CodeCode Available | 0 |
| Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model | Jun 15, 2024 | Question AnsweringVideo Understanding | CodeCode Available | 0 |
| Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts | Nov 15, 2023 | Question AnsweringSentence | CodeCode Available | 0 |
| Improving the Cross-Lingual Generalisation in Visual Question Answering | Sep 7, 2022 | Cross-Lingual TransferQuestion Answering | CodeCode Available | 0 |