| Question-Driven Graph Fusion Network For Visual Question Answering | Apr 3, 2022 | Graph AttentionObject | —Unverified | 0 |
| Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding | Jan 24, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Question-Guided Hybrid Convolution for Visual Question Answering | Aug 8, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Question Guided Modular Routing Networks for Visual Question Answering | Apr 17, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Question-Led Semantic Structure Enhanced Attentions for VQA | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Question Modifiers in Visual Question Answering | Jun 1, 2022 | Natural Language UnderstandingQuestion Answering | —Unverified | 0 |
| Question Relevance in Visual Question Answering | Jul 23, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions | Jun 21, 2016 | Question AnsweringQuestion Similarity | —Unverified | 0 |
| Question Type Guided Attention in Visual Question Answering | Apr 6, 2018 | Activity RecognitionQuestion Answering | —Unverified | 0 |
| Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning | Sep 12, 2023 | Autonomous VehiclesQuestion Answering | —Unverified | 0 |
| Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels | Dec 9, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| RAVEN: A Dataset for Relational and Analogical Visual rEasoNing | Mar 7, 2019 | Object RecognitionQuestion Answering | —Unverified | 0 |
| Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling | Aug 14, 2019 | Question AnsweringScene-Aware Dialogue | —Unverified | 0 |
| Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI | May 12, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Reasoning Over History: Context Aware Visual Dialog | Nov 2, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Recent, rapid advancement in visual question answering architecture: a review | Mar 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Reciprocal Attention Fusion for Visual Question Answering | May 11, 2018 | ObjectQuestion Answering | —Unverified | 0 |
| Recurrent and Contextual Models for Visual Question Answering | Mar 23, 2017 | DiversityMultiple-choice | —Unverified | 0 |
| Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts | Dec 21, 2023 | HallucinationQuestion Answering | —Unverified | 0 |
| Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder | Jul 13, 2020 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Regularizing Attention Networks for Anomaly Detection in Visual Question Answering | Sep 21, 2020 | Anomaly DetectionQuestion Answering | —Unverified | 0 |
| ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding | Jul 7, 2025 | HallucinationQuestion Answering | —Unverified | 0 |
| Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment | Dec 12, 2023 | image-classificationImage Classification | —Unverified | 0 |
| Rephrasing visual questions by specifying the entropy of the answer distribution | Apr 10, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks | Nov 1, 2020 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| Representing Movie Characters in Dialogues | Nov 1, 2019 | Question AnsweringRelation Classification | —Unverified | 0 |
| Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering" | May 21, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| RepsNet: Combining Vision with Language for Automated Medical Reports | Sep 27, 2022 | Contrastive LearningDecoder | —Unverified | 0 |
| RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents | Oct 17, 2024 | Question AnsweringTask Planning | —Unverified | 0 |
| Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization | May 24, 2022 | Image CaptioningOut-of-Distribution Generalization | —Unverified | 0 |
| Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge | Jul 5, 2024 | Instance SegmentationOptical Character Recognition (OCR) | —Unverified | 0 |
| VrR-VG: Refocusing Visually-Relevant Relationships | Feb 1, 2019 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering | Aug 30, 2024 | DecoderLanguage Modeling | —Unverified | 0 |
| Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines | Feb 23, 2025 | Answer GenerationLanguage Modeling | —Unverified | 0 |
| PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models | Oct 9, 2024 | Question AnsweringRetrieval | —Unverified | 0 |
| Retrieving Visual Facts For Few-Shot Visual Question Answering | Jan 16, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Reusable Slotwise Mechanisms | Feb 21, 2023 | Future predictionObject | —Unverified | 0 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 |
| ReWind: Understanding Long Videos with Instructed Learnable Memory | Nov 23, 2024 | Large Language ModelQuestion Answering | —Unverified | 0 |
| ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding | Jun 4, 2025 | NegationNegation Detection | —Unverified | 0 |
| RL-CSDia: Representation Learning of Computer Science Diagrams | Mar 10, 2021 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest | Oct 27, 2024 | Medical Visual Question AnsweringMultiple-choice | —Unverified | 0 |
| RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases | Jan 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | May 21, 2025 | Dataset GenerationDescriptive | —Unverified | 0 |
| RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis | Feb 25, 2024 | Code GenerationMultimodal Reasoning | —Unverified | 0 |
| RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation | Jun 6, 2024 | Common Sense ReasoningMamba | —Unverified | 0 |
| Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization | Sep 26, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Robustness Analysis of Visual QA Models by Basic Questions | Sep 14, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru | Mar 10, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| Robust Visual Question Answering: Datasets, Methods, and Future Challenges | Jul 21, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |