| A Comprehensive Survey on Visual Question Answering Datasets and Algorithms | Nov 17, 2024 | DiagnosticMiscellaneous | —Unverified | 0 | 0 |
| Unanswerable Questions about Images and Texts | Jan 25, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks | Dec 6, 2019 | Image RetrievalInductive Bias | —Unverified | 0 | 0 |
| Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning | Nov 21, 2017 | Question AnsweringReinforcement Learning | —Unverified | 0 | 0 |
| Grounding Task Assistance with Multimodal Cues from a Single Demonstration | May 2, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Uncertainty based Class Activation Maps for Visual Question Answering | Jan 23, 2020 | Deep LearningProbabilistic Deep Learning | —Unverified | 0 | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 | 0 |
| H2OVL-Mississippi Vision Language Models Technical Report | Oct 17, 2024 | Document AIVisual Question Answering | —Unverified | 0 | 0 |
| Grounding Complex Navigational Instructions Using Scene Graphs | Jun 3, 2021 | Question Answeringreinforcement-learning | —Unverified | 0 | 0 |
| Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports | May 22, 2025 | Answer GenerationQuestion Answering | —Unverified | 0 | 0 |
| Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base | Nov 16, 2021 | Question AnsweringSemantic Similarity | —Unverified | 0 | 0 |
| Hadamard product in deep learning: Introduction, Advances and Challenges | Apr 17, 2025 | Computational EfficiencyDeep Learning | —Unverified | 0 | 0 |
| Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base | Jul 27, 2022 | Question AnsweringSemantic Similarity | —Unverified | 0 | 0 |
| Uncovering Bias in Large Vision-Language Models with Counterfactuals | Mar 29, 2024 | counterfactualQuestion Answering | —Unverified | 0 | 0 |
| Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Jun 8, 2025 | AttributeHallucination | —Unverified | 0 | 0 |
| Grounding Answers for Visual Questions Asked by Visually Impaired People | Jun 20, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals | May 30, 2024 | counterfactualQuestion Answering | —Unverified | 0 | 0 |
| A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis | Oct 31, 2023 | DescriptiveMedical Image Analysis | —Unverified | 0 | 0 |
| Grounded Word Sense Translation | Jun 1, 2019 | Grounded language learningMachine Translation | —Unverified | 0 | 0 |
| HAMMR: HierArchical MultiModal React agents for generic VQA | Apr 8, 2024 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 | 0 |
| Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation | Jun 2, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension | Nov 1, 2019 | Caption GenerationQuestion Answering | —Unverified | 0 | 0 |
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning | Mar 10, 2023 | Few-Shot Image Classificationimage-classification | —Unverified | 0 | 0 |
| Understanding Attention for Vision-and-Language Tasks | Dec 17, 2021 | Image GenerationImage Retrieval | —Unverified | 0 | 0 |
| Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning | Jan 1, 2021 | Graph AttentionImage Captioning | —Unverified | 0 | 0 |