| Neural Reasoning, Fast and Slow, for Video Question Answering | Jul 10, 2019 | Natural QuestionsQuestion Answering | —Unverified | 0 |
| Improving Automatic VQA Evaluation Using Large Language Models | Oct 4, 2023 | In-Context LearningQuestion Answering | —Unverified | 0 |
| Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning | Apr 15, 2022 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning | Jan 28, 2024 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Hadamard product in deep learning: Introduction, Advances and Challenges | Apr 17, 2025 | Computational EfficiencyDeep Learning | —Unverified | 0 |
| AVIS: Autonomous Visual Information Seeking with Large Language Model Agent | Jun 13, 2023 | Decision MakingLanguage Modeling | —Unverified | 0 |
| CQ-VQA: Visual Question Answering on Categorized Questions | Feb 17, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Learning to Disambiguate by Asking Discriminative Questions | Aug 9, 2017 | BenchmarkingImage Captioning | —Unverified | 0 |
| Learning to Recognize the Unseen Visual Predicates | Sep 25, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Improving Visual Question Answering by Referring to Generated Paragraph Captions | Jun 14, 2019 | DecoderImage Captioning | —Unverified | 0 |
| Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision | Apr 20, 2020 | counterfactualimage-classification | —Unverified | 0 |
| Improving VQA and its Explanations \\ by Comparing Competing Explanations | Jun 28, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Leveraging Visual Question Answering to Improve Text-to-Image Synthesis | Oct 28, 2020 | Auxiliary LearningImage Generation | —Unverified | 0 |
| Look, Learn and Leverage (L^3): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment | Aug 30, 2024 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| H2OVL-Mississippi Vision Language Models Technical Report | Oct 17, 2024 | Document AIVisual Question Answering | —Unverified | 0 |
| CPL: Counterfactual Prompt Learning for Vision and Language Models | Oct 19, 2022 | counterfactualimage-classification | —Unverified | 0 |
| In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering | Aug 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding | Mar 3, 2024 | Visual Question Answering | —Unverified | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 |
| CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Auto-Parsing Network for Image Captioning and Visual Question Answering | Aug 24, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Learning Sparse Mixture of Experts for Visual Question Answering | Sep 19, 2019 | Mixture-of-ExpertsQuestion Answering | —Unverified | 0 |
| Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning | May 19, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space | Apr 2, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Grounding Task Assistance with Multimodal Cues from a Single Demonstration | May 2, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Apr 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Instruction-augmented Multimodal Alignment for Image-Text and Element Matching | Apr 16, 2025 | Image AugmentationImage Generation | —Unverified | 0 |
| Grounding Complex Navigational Instructions Using Scene Graphs | Jun 3, 2021 | Question Answeringreinforcement-learning | —Unverified | 0 |
| Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports | May 22, 2025 | Answer GenerationQuestion Answering | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Learning Sparsity for Effective and Efficient Music Performance Question Answering | Jun 2, 2025 | Audio-visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models | Mar 8, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 |
| Grounding Answers for Visual Questions Asked by Visually Impaired People | Jun 20, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | Nov 6, 2023 | CoLAQuestion Answering | —Unverified | 0 |
| Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent | Nov 8, 2024 | Autonomous DrivingLanguage Modeling | —Unverified | 0 |
| Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety | Jan 4, 2022 | DecoderDeep Learning | —Unverified | 0 |
| Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs | May 3, 2025 | ChunkingQuestion Answering | —Unverified | 0 |
| Grounded Word Sense Translation | Jun 1, 2019 | Grounded language learningMachine Translation | —Unverified | 0 |
| Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray | Apr 23, 2024 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model | Jan 29, 2024 | FormLanguage Modeling | —Unverified | 0 |
| Learning Rich Image Region Representation for Visual Question Answering | Oct 29, 2019 | Language ModelingLanguage Modelling | —Unverified | 0 |
| GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions | May 24, 2023 | ObjectQuestion Answering | —Unverified | 0 |
| Counterfactual Vision and Language Learning | Jun 1, 2020 | counterfactualQuestion Answering | —Unverified | 0 |
| Interpretable Counting for Visual Question Answering | Dec 23, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models | Oct 21, 2024 | Instruction Followingobject-detection | —Unverified | 0 |
| Analysis on Image Set Visual Question Answering | Mar 31, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback | Mar 19, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Graph-Structured Representations for Visual Question Answering | Sep 19, 2016 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture | Nov 11, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Bilinear Graph Networks for Visual Question Answering | Jul 23, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |