Modularized Zero-shot VQA with Pre-trained Models May 27, 2023 object-detection Object Detection
Code Code Available 05 Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering Mar 24, 2022 GPU Question Answering
Code Code Available 05 Modeling Relationships in Referential Expressions with Compositional Modular Networks Nov 30, 2016 Visual Question Answering (VQA)
Code Code Available 05 Modulating early visual processing by language Jul 2, 2017 Question Answering Visual Question Answering
Code Code Available 05 Is Multimodal Vision Supervision Beneficial to Language? Feb 10, 2023 Image Retrieval Natural Language Understanding
Code Code Available 05 Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos Sep 21, 2022 Action Detection Action Recognition
Code Code Available 05 Exploring Models and Data for Image Question Answering May 8, 2015 Image Segmentation object-detection
Code Code Available 05 Are Red Roses Red? Evaluating Consistency of Question-Answering Models Jul 1, 2019 Question Answering valid
Code Code Available 05 MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering Nov 1, 2021 multimodal interaction Multiple-choice
Code Code Available 05 Mimic and Fool: A Task Agnostic Adversarial Attack Jun 11, 2019 Adversarial Attack Image Captioning
Code Code Available 05 Explainable and Explicit Visual Reasoning over Scene Graphs Dec 5, 2018 Inductive Bias Visual Question Answering (VQA)
Code Code Available 05 CAST: Cross-modal Alignment Similarity Test for Vision Language Models Sep 17, 2024 cross-modal alignment Question Answering
Code Code Available 05 Cascaded Mutual Modulation for Visual Reasoning Sep 6, 2018 Question Answering Visual Question Answering
Code Code Available 05 ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions Oct 17, 2024 Visual Question Answering (VQA)
Code Code Available 05 CARETS: A Consistency And Robustness Evaluative Test Suite for VQA Mar 15, 2022 Negation Question Generation
Code Code Available 05 MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding Jan 11, 2020 Image Captioning Image-text Retrieval
Code Code Available 05 Transformer Module Networks for Systematic Generalization in Visual Question Answering Jan 27, 2022 Question Answering Systematic Generalization
Code Code Available 05 MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models Feb 28, 2025 Decision Making Hallucination
Code Code Available 05 Medical Large Vision Language Models with Multi-Image Visual Ability May 25, 2025 Question Answering Visual Question Answering (VQA)
Code Code Available 05 A Question-Centric Model for Visual Question Answering in Medical Imaging Mar 2, 2020 Medical Image Analysis Question Answering
Code Code Available 05 Applying recent advances in Visual Question Answering to Record Linkage Jul 12, 2020 Question Answering Visual Question Answering
Code Code Available 05 Delving Deeper into Cross-lingual Visual Question Answering Feb 15, 2022 Inductive Bias Question Answering
Code Code Available 05 A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering Oct 1, 2022 Medical Visual Question Answering Question Answering
Code Code Available 05 Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning Jul 6, 2022 Diagnostic Multi-Task Learning
Code Code Available 05 Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm Aug 16, 2024 Decision Making Medical Visual Question Answering
Code Code Available 05 ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments Oct 8, 2024 Decoder Question Answering
Code Code Available 05 μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Jul 1, 2024 Cell Detection Classification
Code Code Available 05 Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Mar 18, 2025 document understanding Question Answering
Code Code Available 05 Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 05 Multimodal Residual Learning for Visual QA Jun 5, 2016 Multiple-choice Question Answering
Code Code Available 05 LXMERT Model Compression for Visual Question Answering Oct 23, 2023 model Model Compression
Code Code Available 05 M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base Dec 16, 2023 cross-modal alignment Knowledge Graphs
Code Code Available 05 Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations Mar 5, 2025 Question Answering Visual Question Answering
Code Code Available 05 Enhancing the AI2 Diagrams Dataset Using Rhetorical Structure Theory May 1, 2018 Question Answering Visual Question Answering (VQA)
Code Code Available 05 Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Dec 2, 2016 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue Reasoning Jun 9, 2025 Future prediction Question Answering
Code Code Available 05 Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View Oct 30, 2020 Face Recognition image-classification
Code Code Available 05 KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Mar 31, 2025 Form Question Answering
Code Code Available 05 Locally Smoothed Neural Networks Nov 22, 2017 Face Verification Question Answering
Code Code Available 05 Logical Implications for Visual Question Answering Consistency Mar 16, 2023 Language Modeling Language Modelling
Code Code Available 05 Kvasir-VQA: A Text-Image Pair GI Tract Dataset Sep 2, 2024 Image Captioning Image Generation
Code Code Available 05 Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Jun 11, 2025 Medical Visual Question Answering Question Answering
Code Code Available 05 LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering May 29, 2021 Question Answering Visual Question Answering
Code Code Available 05 Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation Jun 27, 2024 Continual Learning Question Answering
Code Code Available 05 LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models Aug 26, 2024 Large Language Model Video Quality Assessment
Code Code Available 05 Answer Them All! Toward Universal Visual Question Answering Models Mar 1, 2019 All Question Answering
Code Code Available 05 LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery Feb 26, 2024 Continual Learning Exemplar-Free
Code Code Available 05 End-to-end optimization of goal-driven and visually grounded dialogue systems Mar 15, 2017 Decoder Deep Reinforcement Learning
Code Code Available 05 End-to-End Instance Segmentation with Recurrent Attention May 30, 2016 Autonomous Driving Image Captioning
Code Code Available 05 Answer Questions with Right Image Regions: A Visual Attention Regularization Approach Feb 3, 2021 Question Answering Visual Grounding
Code Code Available 05