| Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru | Mar 10, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 | 0 |
| Robust Visual Question Answering: Datasets, Methods, and Future Challenges | Jul 21, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Robust Visual Reasoning via Language Guided Neural Module Networks | Dec 1, 2021 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset | Oct 8, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Question Answering (VQA) on Images with Superimposed Text | Jun 13, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Abduction of Domain Relationships from Data for VQA | Feb 13, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Chain of Thought Prompt Tuning in Vision Language Models | Apr 16, 2023 | Domain Generalizationimage-classification | —Unverified | 0 | 0 |
| RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering | Nov 3, 2024 | DescriptiveImage Captioning | —Unverified | 0 | 0 |
| RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model | Apr 7, 2025 | Image Captioningimage-classification | —Unverified | 0 | 0 |
| Chain of Reasoning for Visual Question Answering | Dec 1, 2018 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning | Feb 26, 2024 | Data Augmentationdocument understanding | —Unverified | 0 | 0 |
| RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data | Oct 23, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| RSVQA: Visual Question Answering for Remote Sensing Data | Mar 16, 2020 | Land Cover ClassificationObject Counting | —Unverified | 0 | 0 |
| Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness | Jul 2, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs | Dec 3, 2024 | Image CaptioningQuantization | —Unverified | 0 | 0 |
| Visual Question Answering with Memory-Augmented Networks | Jul 17, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Question Answering with Prior Class Semantics | May 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 | 0 |
| Visual Question Answering with Question Representation Update (QRU) | Dec 1, 2016 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| SAR Strikes Back: A New Hope for RSVQA | Jan 14, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering | Nov 7, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 | 0 |
| Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering | Oct 9, 2023 | Answer GenerationQuestion Answering | —Unverified | 0 | 0 |
| Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models | Dec 9, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering | Jan 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Categorizing Concepts With Basic Level for Vision-to-Language | Jun 1, 2018 | ClusteringImage Captioning | —Unverified | 0 | 0 |