| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 | 5 |
| MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models | Dec 10, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos | Sep 21, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 | 5 |
| A simple neural network module for relational reasoning | Jun 5, 2017 | Image Retrieval with Multi-Modal QueryQuestion Answering | CodeCode Available | 0 | 5 |
| MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding | Jan 11, 2020 | Image CaptioningImage-text Retrieval | CodeCode Available | 0 | 5 |
| A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models | Aug 2, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| A Simple Baseline for Knowledge-Based Visual Question Answering | Oct 20, 2023 | In-Context LearningQuestion Answering | CodeCode Available | 0 | 5 |
| Treble Counterfactual VLMs: A Causal Approach to Hallucination | Mar 8, 2025 | Autonomous Drivingcounterfactual | CodeCode Available | 0 | 5 |
| Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm | Aug 16, 2024 | Decision MakingMedical Visual Question Answering | CodeCode Available | 0 | 5 |
| MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models | Feb 28, 2025 | Decision MakingHallucination | CodeCode Available | 0 | 5 |
| Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images | Feb 8, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 0 | 5 |
| Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning | Jul 6, 2022 | DiagnosticMulti-Task Learning | CodeCode Available | 0 | 5 |
| Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering | Dec 11, 2024 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | CodeCode Available | 0 | 5 |
| Measuring Faithful and Plausible Visual Grounding in VQA | May 24, 2023 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models | Jul 22, 2024 | DisentanglementQuestion Answering | CodeCode Available | 0 | 5 |
| BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA | Mar 4, 2025 | Medical DiagnosisQuestion Answering | CodeCode Available | 0 | 5 |
| Uncovering the Full Potential of Visual Grounding Methods in VQA | Jan 15, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts | Jun 25, 2024 | FairnessQuestion Answering | CodeCode Available | 0 | 5 |
| ArtQuest: Countering Hidden Language Biases in ArtVQA | Jan 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Evaluating Attribute Comprehension in Large Vision-Language Models | Aug 25, 2024 | AttributeImage-text matching | CodeCode Available | 0 | 5 |
| ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments | Oct 8, 2024 | DecoderQuestion Answering | CodeCode Available | 0 | 5 |
| MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | Mar 29, 2023 | Cross-Modal RetrievalDecoder | CodeCode Available | 0 | 5 |
| MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models | Dec 31, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| LXMERT Model Compression for Visual Question Answering | Oct 23, 2023 | modelModel Compression | CodeCode Available | 0 | 5 |