| Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images | Feb 8, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 0 | 5 |
| MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding | Jan 11, 2020 | Image CaptioningImage-text Retrieval | CodeCode Available | 0 | 5 |
| Kvasir-VQA: A Text-Image Pair GI Tract Dataset | Sep 2, 2024 | Image CaptioningImage Generation | CodeCode Available | 0 | 5 |
| Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy | Jun 11, 2025 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm | Aug 16, 2024 | Decision MakingMedical Visual Question Answering | CodeCode Available | 0 | 5 |
| Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts | Jun 25, 2024 | FairnessQuestion Answering | CodeCode Available | 0 | 5 |
| ArtQuest: Countering Hidden Language Biases in ArtVQA | Jan 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection | Jan 31, 2019 | Question AnsweringRelationship Detection | CodeCode Available | 0 | 5 |
| Evaluating Attribute Comprehension in Large Vision-Language Models | Aug 25, 2024 | AttributeImage-text matching | CodeCode Available | 0 | 5 |
| ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments | Oct 8, 2024 | DecoderQuestion Answering | CodeCode Available | 0 | 5 |
| MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models | Feb 28, 2025 | Decision MakingHallucination | CodeCode Available | 0 | 5 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| Measuring Faithful and Plausible Visual Grounding in VQA | May 24, 2023 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations | Mar 5, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Are VLMs Really Blind | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | Mar 29, 2023 | Cross-Modal RetrievalDecoder | CodeCode Available | 0 | 5 |
| Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering | Dec 2, 2016 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models | Dec 31, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| CAST: Cross-modal Alignment Similarity Test for Vision Language Models | Sep 17, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 | 5 |
| Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens | Jun 19, 2024 | Caption Generationimage-classification | CodeCode Available | 0 | 5 |
| Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding | Apr 20, 2025 | Autonomous DrivingImage Captioning | CodeCode Available | 0 | 5 |
| Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation | Jun 27, 2024 | Continual LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data | Apr 7, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Cascaded Mutual Modulation for Visual Reasoning | Sep 6, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering | May 29, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |