| Cascaded Mutual Modulation for Visual Reasoning | Sep 6, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering | Nov 1, 2021 | multimodal interactionMultiple-choice | CodeCode Available | 0 | 5 |
| End-to-End Instance Segmentation with Recurrent Attention | May 30, 2016 | Autonomous DrivingImage Captioning | CodeCode Available | 0 | 5 |
| End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features | Jun 21, 2018 | Question AnsweringVideo Description | CodeCode Available | 0 | 5 |
| MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding | Jan 11, 2020 | Image CaptioningImage-text Retrieval | CodeCode Available | 0 | 5 |
| Mixture-of-Subspaces in Low-Rank Adaptation | Jun 16, 2024 | Common Sense ReasoningImage Generation | CodeCode Available | 0 | 5 |
| MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models | Feb 28, 2025 | Decision MakingHallucination | CodeCode Available | 0 | 5 |
| Measuring Faithful and Plausible Visual Grounding in VQA | May 24, 2023 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm | Aug 16, 2024 | Decision MakingMedical Visual Question Answering | CodeCode Available | 0 | 5 |
| Learning to Count Objects in Natural Images for Visual Question Answering | Feb 15, 2018 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures | Jul 8, 2017 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 0 | 5 |
| MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models | Dec 31, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| Learning to Follow Object-Centric Image Editing Instructions Faithfully | Oct 29, 2023 | ObjectQuestion Answering | CodeCode Available | 0 | 5 |
| Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs | Apr 11, 2024 | DescriptiveHallucination | CodeCode Available | 0 | 5 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering | Dec 2, 2016 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| A Question-Centric Model for Visual Question Answering in Medical Imaging | Mar 2, 2020 | Medical Image AnalysisQuestion Answering | CodeCode Available | 0 | 5 |
| EaSe: A Diagnostic Tool for VQA based on Answer Diversity | Jun 1, 2021 | DiagnosticDiversity | CodeCode Available | 0 | 5 |
| MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | Mar 29, 2023 | Cross-Modal RetrievalDecoder | CodeCode Available | 0 | 5 |
| LXMERT Model Compression for Visual Question Answering | Oct 23, 2023 | modelModel Compression | CodeCode Available | 0 | 5 |
| Applying recent advances in Visual Question Answering to Record Linkage | Jul 12, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering | Dec 24, 2024 | Explanatory Visual Question AnsweringMultimodal Reasoning | CodeCode Available | 0 | 5 |
| LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering | May 29, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers | Apr 21, 2024 | DiagnosticImage Captioning | CodeCode Available | 0 | 5 |
| Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery | Oct 29, 2023 | Deep LearningMultimodal Deep Learning | CodeCode Available | 0 | 5 |
| Logical Implications for Visual Question Answering Consistency | Mar 16, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Locally Smoothed Neural Networks | Nov 22, 2017 | Face VerificationQuestion Answering | CodeCode Available | 0 | 5 |
| Dynamic Memory Networks for Visual and Textual Question Answering | Mar 4, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering | Mar 6, 2022 | Graph AttentionQuestion Answering | CodeCode Available | 0 | 5 |
| LLaVA-OneVision: Easy Visual Task Transfer | Aug 6, 2024 | 3D Question Answering (3D-QA) | CodeCode Available | 0 | 5 |
| Learning Visual Question Answering by Bootstrapping Hard Attention | Aug 1, 2018 | Hard AttentionQuestion Answering | CodeCode Available | 0 | 5 |
| LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery | Feb 26, 2024 | Continual LearningExemplar-Free | CodeCode Available | 0 | 5 |
| Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View | Oct 30, 2020 | Face Recognitionimage-classification | CodeCode Available | 0 | 5 |
| Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances | Sep 18, 2022 | AttributeQuestion Answering | CodeCode Available | 0 | 5 |
| Siamese Tracking with Lingual Object Constraints | Nov 23, 2020 | ObjectObject Tracking | CodeCode Available | 0 | 5 |
| Learning Visual Knowledge Memory Networks for Visual Question Answering | Jun 13, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering | Jun 1, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning to Specialize with Knowledge Distillation for Visual Question Answering | Dec 1, 2018 | General ClassificationGeneral Knowledge | —Unverified | 0 | 0 |
| Learning to Select Question-Relevant Relations for Visual Question Answering | Jun 1, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 | 0 |
| Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering | Dec 13, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks | Apr 14, 2025 | EthicsFairness | —Unverified | 0 | 0 |
| Learning to Recognize the Unseen Visual Predicates | Sep 25, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Neural Reasoning, Fast and Slow, for Video Question Answering | Jul 10, 2019 | Natural QuestionsQuestion Answering | —Unverified | 0 | 0 |
| DUBLIN -- Document Understanding By Language-Image Network | May 23, 2023 | Document Classificationdocument understanding | —Unverified | 0 | 0 |
| BuDDIE: A Business Document Dataset for Multi-task Information Extraction | Apr 5, 2024 | Document Classificationdocument understanding | —Unverified | 0 | 0 |
| Learning to Disambiguate by Asking Discriminative Questions | Aug 9, 2017 | BenchmarkingImage Captioning | —Unverified | 0 | 0 |
| Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Sep 11, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning to Compose Diversified Prompts for Image Emotion Classification | Jan 26, 2022 | ClassificationEmotion Classification | —Unverified | 0 | 0 |
| DualNet: Domain-Invariant Network for Visual Question Answering | Jun 20, 2016 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Bridging the Semantic Gaps: Improving Medical VQA Consistency with LLM-Augmented Question Sets | Apr 16, 2025 | DiversityMedical Visual Question Answering | —Unverified | 0 | 0 |