| Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation | Dec 22, 2021 | Common Sense ReasoningQuestion Answering | CodeCode Available | 1 | 5 |
| MapQA: A Dataset for Question Answering on Choropleth Maps | Nov 15, 2022 | ArticlesQuestion Answering | CodeCode Available | 1 | 5 |
| Explaining Autonomous Driving Actions with Visual Question Answering | Jul 19, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 | 5 |
| Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture | Nov 22, 2021 | Handwritten Text Recognitionobject-detection | CodeCode Available | 1 | 5 |
| Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering | Jul 22, 2023 | Graph Representation LearningLanguage Modeling | CodeCode Available | 1 | 5 |
| GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection | Nov 5, 2023 | Anomaly DetectionQuestion Answering | CodeCode Available | 1 | 5 |
| INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | Jul 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting | Oct 13, 2022 | Image CaptioningQuestion Answering | CodeCode Available | 1 | 5 |
| Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning | May 29, 2025 | DiagnosticQuestion Answering | CodeCode Available | 1 | 5 |
| Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering | Sep 19, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 1 | 5 |
| Evaluating Multimodal Representations on Visual Semantic Textual Similarity | Apr 4, 2020 | BenchmarkingImage Captioning | CodeCode Available | 1 | 5 |
| Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning | Dec 4, 2024 | Multimodal Large Language ModelVideo Understanding | CodeCode Available | 1 | 5 |
| Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos | Jan 1, 2021 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | Dec 6, 2024 | Multimodal ReasoningVisual Question Answering | CodeCode Available | 1 | 5 |
| ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding | Aug 5, 2022 | Image RetrievalQuestion Answering | CodeCode Available | 1 | 5 |
| Faithful Multimodal Explanation for Visual Question Answering | Sep 8, 2018 | Explanatory Visual Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding | May 26, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 1 | 5 |
| Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization | Dec 19, 2024 | Contrastive LearningDecision Making | CodeCode Available | 1 | 5 |
| Beyond Embeddings: The Promise of Visual Table in Visual Reasoning | Mar 27, 2024 | Representation LearningVisual Question Answering | CodeCode Available | 1 | 5 |
| MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model | Oct 11, 2022 | Contrastive LearningImage-text matching | CodeCode Available | 1 | 5 |
| ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification | Apr 29, 2025 | DiagnosticQuestion Answering | CodeCode Available | 1 | 5 |
| Describe Anything Model for Visual Question Answering on Text-rich Images | Jul 16, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 1 | 5 |
| Are Bias Mitigation Techniques for Deep Learning Effective? | Apr 1, 2021 | Deep LearningQuestion Answering | CodeCode Available | 1 | 5 |
| Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering | Apr 7, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 | 5 |
| Check It Again:Progressive Visual Question Answering via Visual Entailment | Aug 1, 2021 | Question AnsweringVisual Entailment | CodeCode Available | 1 | 5 |