| PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models | Oct 9, 2024 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| Retrieving Visual Facts For Few-Shot Visual Question Answering | Jan 16, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Reusable Slotwise Mechanisms | Feb 21, 2023 | Future predictionObject | —Unverified | 0 | 0 |
| Visual Question Answering in the Medical Domain | Sep 20, 2023 | Contrastive LearningMedical Visual Question Answering | —Unverified | 0 | 0 |
| Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads | Apr 30, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Question Answering on 360° Images | Jan 10, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 | 0 |
| Visual Question Answering on Image Sets | Aug 27, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla | Oct 19, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| ReWind: Understanding Long Videos with Instructed Learnable Memory | Nov 23, 2024 | Large Language ModelQuestion Answering | —Unverified | 0 | 0 |
| Visual Question Answering on Multiple Remote Sensing Image Modalities | May 21, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding | Jun 4, 2025 | NegationNegation Detection | —Unverified | 0 | 0 |
| A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering | May 22, 2025 | counterfactualMedical Visual Question Answering | —Unverified | 0 | 0 |
| CHIC: Corporate Document for Visual question Answering | May 1, 2023 | Information RetrievalQuestion Answering | —Unverified | 0 | 0 |
| RL-CSDia: Representation Learning of Computer Science Diagrams | Mar 10, 2021 | Question AnsweringRepresentation Learning | —Unverified | 0 | 0 |
| Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations | Sep 27, 2024 | Chart Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest | Oct 27, 2024 | Medical Visual Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases | Jan 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | May 21, 2025 | Dataset GenerationDescriptive | —Unverified | 0 | 0 |
| RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis | Feb 25, 2024 | Code GenerationMultimodal Reasoning | —Unverified | 0 | 0 |
| RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation | Jun 6, 2024 | Common Sense ReasoningMamba | —Unverified | 0 | 0 |
| Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization | Sep 26, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 | 0 |
| Visual Question Answering Using Semantic Information from Image Descriptions | Apr 23, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Characterizing Misclassifications of Deep NLP Models | Mar 12, 2021 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 | 0 |
| Robustness Analysis of Visual QA Models by Basic Questions | Sep 14, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru | Mar 10, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 | 0 |
| Robust Visual Question Answering: Datasets, Methods, and Future Challenges | Jul 21, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Robust Visual Reasoning via Language Guided Neural Module Networks | Dec 1, 2021 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset | Oct 8, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Question Answering (VQA) on Images with Superimposed Text | Jun 13, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Abduction of Domain Relationships from Data for VQA | Feb 13, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Chain of Thought Prompt Tuning in Vision Language Models | Apr 16, 2023 | Domain Generalizationimage-classification | —Unverified | 0 | 0 |
| RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering | Nov 3, 2024 | DescriptiveImage Captioning | —Unverified | 0 | 0 |
| RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model | Apr 7, 2025 | Image Captioningimage-classification | —Unverified | 0 | 0 |
| Chain of Reasoning for Visual Question Answering | Dec 1, 2018 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning | Feb 26, 2024 | Data Augmentationdocument understanding | —Unverified | 0 | 0 |
| RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data | Oct 23, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| RSVQA: Visual Question Answering for Remote Sensing Data | Mar 16, 2020 | Land Cover ClassificationObject Counting | —Unverified | 0 | 0 |
| Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness | Jul 2, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs | Dec 3, 2024 | Image CaptioningQuantization | —Unverified | 0 | 0 |
| Visual Question Answering with Memory-Augmented Networks | Jul 17, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Question Answering with Prior Class Semantics | May 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 | 0 |
| Visual Question Answering with Question Representation Update (QRU) | Dec 1, 2016 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| SAR Strikes Back: A New Hope for RSVQA | Jan 14, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering | Nov 7, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 | 0 |
| Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering | Oct 9, 2023 | Answer GenerationQuestion Answering | —Unverified | 0 | 0 |
| Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models | Dec 9, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering | Jan 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Categorizing Concepts With Basic Level for Vision-to-Language | Jun 1, 2018 | ClusteringImage Captioning | —Unverified | 0 | 0 |