| Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering | Apr 24, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering | Mar 19, 2023 | Common Sense ReasoningInformation Retrieval | —Unverified | 0 |
| FVQA: Fact-based Visual Question Answering | Jun 17, 2016 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| Gamified crowd-sourcing of high-quality data for visual fine-tuning | Oct 5, 2024 | Visual Question Answering | —Unverified | 0 |
| GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance | May 25, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 |
| GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis | Nov 25, 2024 | Medical Visual Question AnsweringMultiple-choice | —Unverified | 0 |
| GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning | Jun 22, 2025 | Answer GenerationDecision Making | —Unverified | 0 |
| Gemini Pro Defeated by GPT-4V: Evidence from Education | Dec 27, 2023 | image-classificationImage Classification | —Unverified | 0 |
| Gender and Racial Bias in Visual Question Answering Datasets | May 17, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems | Oct 26, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Generalized Hadamard-Product Fusion Operators for Visual Question Answering | Mar 26, 2018 | Neural Architecture SearchQuestion Answering | —Unverified | 0 |
| Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge | May 30, 2023 | Answer SelectionQuestion Answering | —Unverified | 0 |
| Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention | Feb 15, 2019 | Explanation GenerationLanguage Modeling | —Unverified | 0 |
| Generating Natural Questions from Images for Multimodal Assistants | Nov 17, 2020 | Common Sense ReasoningNatural Questions | —Unverified | 0 |
| Generating Rationales in Visual Question Answering | Apr 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Generating Triples with Adversarial Networks for Scene Graph Construction | Feb 7, 2018 | Attributegraph construction | —Unverified | 0 |
| Generative Visual Question Answering | Jul 18, 2023 | Generative Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Generic Attention-model Explainability by Weighted Relevance Accumulation | Aug 20, 2023 | Image CaptioningQuestion Answering | —Unverified | 0 |
| GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing | Jan 12, 2025 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Mar 16, 2025 | Change DetectionImage Captioning | —Unverified | 0 |
| GiVE: Guiding Visual Encoder to Perceive Overlooked Information | Oct 26, 2024 | ObjectQuestion Answering | —Unverified | 0 |
| γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models | Oct 17, 2024 | Visual Question Answering | —Unverified | 0 |
| Goal-Oriented Semantic Communication for Wireless Visual Question Answering | Nov 3, 2024 | Edge-computingQuestion Answering | —Unverified | 0 |
| Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning | Oct 21, 2019 | Data AugmentationDecision Making | —Unverified | 0 |
| GPT-4V Explorations: Mining Autonomous Driving | Jun 24, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| GRADE: Quantifying Sample Diversity in Text-to-Image Models | Oct 29, 2024 | AttributeDiversity | —Unverified | 0 |
| GRAM: Global Reasoning for Multi-Page VQA | Jan 7, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network | Sep 30, 2020 | Heuristic SearchQuestion Answering | —Unverified | 0 |
| Graph Neural Networks in Vision-Language Image Understanding: A Survey | Mar 7, 2023 | Image CaptioningImage Retrieval | —Unverified | 0 |
| Bilinear Graph Networks for Visual Question Answering | Jul 23, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture | Nov 11, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Graph-Structured Representations for Visual Question Answering | Sep 19, 2016 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback | Mar 19, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models | Oct 21, 2024 | Instruction Followingobject-detection | —Unverified | 0 |
| GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions | May 24, 2023 | ObjectQuestion Answering | —Unverified | 0 |
| Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray | Apr 23, 2024 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Grounded Word Sense Translation | Jun 1, 2019 | Grounded language learningMachine Translation | —Unverified | 0 |
| Grounding Answers for Visual Questions Asked by Visually Impaired People | Jun 20, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports | May 22, 2025 | Answer GenerationQuestion Answering | —Unverified | 0 |
| Grounding Complex Navigational Instructions Using Scene Graphs | Jun 3, 2021 | Question Answeringreinforcement-learning | —Unverified | 0 |
| Grounding Task Assistance with Multimodal Cues from a Single Demonstration | May 2, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 |
| H2OVL-Mississippi Vision Language Models Technical Report | Oct 17, 2024 | Document AIVisual Question Answering | —Unverified | 0 |
| Hadamard product in deep learning: Introduction, Advances and Challenges | Apr 17, 2025 | Computational EfficiencyDeep Learning | —Unverified | 0 |
| Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Jun 8, 2025 | AttributeHallucination | —Unverified | 0 |
| HAMMR: HierArchical MultiModal React agents for generic VQA | Apr 8, 2024 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 |
| Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation | Jun 2, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning | Jan 1, 2021 | Graph AttentionImage Captioning | —Unverified | 0 |
| Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion | Apr 4, 2025 | DiagnosticMedical Visual Question Answering | —Unverified | 0 |
| HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision | Apr 15, 2024 | ObjectQuestion Answering | —Unverified | 0 |