| Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering | Aug 31, 2020 | Knowledge GraphsQuestion Answering | —Unverified | 0 | 0 |
| Cross-Modal Generative Augmentation for Visual Question Answering | May 11, 2021 | Data AugmentationQuestion Answering | —Unverified | 0 | 0 |
| A Focused Dynamic Attention Model for Visual Question Answering | Apr 6, 2016 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Crossformer: Transformer with Alternated Cross-Layer Guidance | Sep 29, 2021 | Inductive BiasMachine Translation | —Unverified | 0 | 0 |
| Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference | Sep 25, 2019 | Common Sense ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Cross-Dataset Adaptation for Visual Question Answering | Jun 10, 2018 | Domain AdaptationQuestion Answering | —Unverified | 0 | 0 |
| CROME: Cross-Modal Adapters for Efficient Multimodal LLM | Aug 13, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 | 0 |
| POINTS: Improving Your Vision-language Model with Affordable Strategies | Sep 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Polar-VQA: Visual Question Answering on Remote Sensed Ice sheet Imagery from Polar Region | Mar 13, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CREPE: Coordinate-Aware End-to-End Document Parser | May 1, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain? | Dec 27, 2021 | ArticlesMedical Visual Question Answering | —Unverified | 0 | 0 |
| Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models | Jun 14, 2024 | DecoderKnowledge Graphs | —Unverified | 0 | 0 |
| CQ-VQA: Visual Question Answering on Categorized Questions | Feb 17, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Predicting Relative Depth between Objects from Semantic Features | Jan 12, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CPL: Counterfactual Prompt Learning for Vision and Language Models | Oct 19, 2022 | counterfactualimage-classification | —Unverified | 0 | 0 |
| PreSTU: Pre-Training for Scene-Text Understanding | Sep 12, 2022 | DecoderImage Captioning | —Unverified | 0 | 0 |
| Pre-training image-language transformers for open-vocabulary tasks | Sep 9, 2022 | Question AnsweringVisual Entailment | —Unverified | 0 | 0 |
| Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs | Apr 23, 2024 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions | Jun 13, 2025 | Conformal PredictionQuestion Answering | —Unverified | 0 | 0 |
| CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Apr 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Privacy Preserving Visual Question Answering | Feb 15, 2022 | Privacy PreservingQuestion Answering | —Unverified | 0 | 0 |
| Aesthetic Visual Question Answering of Photographs | Aug 10, 2022 | Question AnsweringSentiment Analysis | —Unverified | 0 | 0 |
| Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering | Feb 21, 2019 | counterfactualQuestion Answering | —Unverified | 0 | 0 |