| Faithful Multimodal Explanation for Visual Question Answering | Sep 8, 2018 | Explanatory Visual Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images | Oct 28, 2023 | Decision MakingMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering | Oct 27, 2020 | DiagnosticQuestion Answering | CodeCode Available | 1 | 5 |
| Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? | Feb 23, 2023 | Open-Domain Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs | Mar 27, 2025 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning | May 31, 2022 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 | 5 |
| Explaining Autonomous Driving Actions with Visual Question Answering | Jul 19, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 | 5 |
| Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering | Jul 22, 2023 | Graph Representation LearningLanguage Modeling | CodeCode Available | 1 | 5 |
| Foundation Model is Efficient Multimodal Multitask Model Selector | Aug 11, 2023 | modelModel Selection | CodeCode Available | 1 | 5 |
| GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection | Nov 5, 2023 | Anomaly DetectionQuestion Answering | CodeCode Available | 1 | 5 |