| How Modular Should Neural Module Networks Be for Systematic Generalization? | Jun 15, 2021 | Question AnsweringSystematic Generalization | CodeCode Available | 0 |
| Targeted Visual Prompting for Medical Visual Question Answering | Aug 6, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Self Supervision for Attention Networks | Jan 6, 2021 | image-classificationImage Classification | CodeCode Available | 0 |
| VQA Therapy: Exploring Answer Differences by Visually Grounding Answers | Aug 21, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| UMIT: Unifying Medical Imaging Tasks via Vision-Language Models | Mar 20, 2025 | DiagnosticMedical Image Analysis | CodeCode Available | 0 |
| Semantically Equivalent Adversarial Rules for Debugging NLP models | Jul 1, 2018 | Data AugmentationQuestion Answering | CodeCode Available | 0 |
| Alignment Attention by Matching Key and Query Distributions | Oct 25, 2021 | Graph AttentionQuestion Answering | CodeCode Available | 0 |
| UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models | Oct 17, 2023 | AttributeQuestion Answering | CodeCode Available | 0 |
| Deep Modular Co-Attention Networks for Visual Question Answering | Jun 25, 2019 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| High-Order Attention Models for Visual Question Answering | Nov 12, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| 12-in-1: Multi-Task Vision and Language Representation Learning | Dec 5, 2019 | 10-shot image generationImage Retrieval | CodeCode Available | 0 |
| Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations | May 15, 2019 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Separate and Locate: Rethink the Text in Text-based Visual Question Answering | Aug 31, 2023 | Optical Character Recognition (OCR)Position | CodeCode Available | 0 |
| Hierarchical Deep Multi-modal Network for Medical Visual Question Answering | Sep 27, 2020 | DescriptiveMedical Visual Question Answering | CodeCode Available | 0 |
| Visual Question Answering: Datasets, Algorithms, and Future Challenges | Oct 5, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Are VLMs Really Blind | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests | Dec 3, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| ShapeWorld - A new test methodology for multimodal language understanding | Apr 14, 2017 | Multimodal Deep LearningVisual Question Answering | CodeCode Available | 0 |
| ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | Nov 21, 2023 | DescriptiveMME | CodeCode Available | 0 |
| Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? | Oct 17, 2024 | AllLanguage Modeling | CodeCode Available | 0 |
| Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog | Oct 11, 2023 | Question AnsweringResponse Generation | CodeCode Available | 0 |
| Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering | Apr 11, 2017 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language | May 28, 2023 | Machine TranslationMultimodal Machine Translation | CodeCode Available | 0 |
| HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models | Dec 29, 2024 | HallucinationObject | CodeCode Available | 0 |
| Uncovering the Full Potential of Visual Grounding Methods in VQA | Jan 15, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 |