| Compositionality as Lexical Symmetry | Jan 30, 2022 | Data AugmentationInductive Bias | CodeCode Available | 0 |
| Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances | Sep 18, 2022 | AttributeQuestion Answering | CodeCode Available | 0 |
| MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models | Dec 10, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| Compositional Image-Text Matching and Retrieval by Grounding Entities | May 4, 2025 | Image CaptioningImage-text matching | CodeCode Available | 0 |
| TallyQA: Answering Complex Counting Questions | Oct 29, 2018 | AttributeObject Counting | CodeCode Available | 0 |
| Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios | May 21, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction | Mar 25, 2025 | Generative Visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Mixture-of-Subspaces in Low-Rank Adaptation | Jun 16, 2024 | Common Sense ReasoningImage Generation | CodeCode Available | 0 |
| Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA | Mar 17, 2021 | Question AnsweringRelational Reasoning | CodeCode Available | 0 |
| MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering | Nov 1, 2021 | multimodal interactionMultiple-choice | CodeCode Available | 0 |
| Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering | Sep 30, 2022 | Continual LearningQuestion Answering | CodeCode Available | 0 |
| MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding | Jan 11, 2020 | Image CaptioningImage-text Retrieval | CodeCode Available | 0 |
| Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm | Aug 16, 2024 | Decision MakingMedical Visual Question Answering | CodeCode Available | 0 |
| P NP, at least in Visual Question Answering | Mar 26, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models | Feb 28, 2025 | Decision MakingHallucination | CodeCode Available | 0 |
| AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | Oct 28, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Measuring Faithful and Plausible Visual Grounding in VQA | May 24, 2023 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments | Oct 8, 2024 | DecoderQuestion Answering | CodeCode Available | 0 |
| Patent Figure Classification using Large Vision-language Models | Jan 22, 2025 | ClassificationFew-Shot Learning | CodeCode Available | 0 |
| VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering | Dec 12, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 |
| MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models | Dec 31, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese | Oct 27, 2023 | Information RetrievalNatural Language Queries | CodeCode Available | 0 |
| MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | Mar 29, 2023 | Cross-Modal RetrievalDecoder | CodeCode Available | 0 |
| ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering | Oct 18, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |