CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images Apr 13, 2021 Question Answering Visual Question Answering
Code Code Available 05 Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering Apr 11, 2017 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 Simple Baseline for Visual Question Answering Dec 7, 2015 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 ArtQuest: Countering Hidden Language Biases in ArtVQA Jan 4, 2024 Question Answering Visual Question Answering
Code Code Available 05 Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering Aug 4, 2017 Question Answering Visual Question Answering
Code Code Available 05 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Jun 6, 2016 Phrase Grounding Visual Grounding
Code Code Available 05 HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation May 16, 2025 Benchmarking Ethics
Code Code Available 05 Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs Oct 26, 2023 Attribute Machine Translation
Code Code Available 05 CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning Nov 26, 2018 Acoustic Question Answering Question Answering
Code Code Available 05 D3: Data Diversity Design for Systematic Generalization in Visual Question Answering Sep 15, 2023 Diversity Question Answering
Code Code Available 05 Inferring and Executing Programs for Visual Reasoning May 10, 2017 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 05 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence Feb 15, 2018 Activity Recognition Explainable Models
Code Code Available 05 Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 05 Multi-Image Visual Question Answering Dec 27, 2021 Question Answering Visual Question Answering
Code Code Available 05 Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions Nov 20, 2023 Question Answering Visual Question Answering
Code Code Available 05 FigureQA: An Annotated Figure Dataset for Visual Reasoning Oct 19, 2017 BIG-bench Machine Learning Chart Question Answering
Code Code Available 05 Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering May 21, 2015 Question Answering Sentence
Code Code Available 05 MUREL: Multimodal Relational Reasoning for Visual Question Answering Feb 25, 2019 Relational Reasoning Visual Question Answering
Code Code Available 05 OmniNet: A unified architecture for multi-modal multi-task learning Jul 17, 2019 Image Captioning Multi-Task Learning
Code Code Available 05 Few-Shot Multimodal Explanation for Visual Question Answering Oct 28, 2024 Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
Code Code Available 05 Federated Document Visual Question Answering: A Pilot Study May 10, 2024 Federated Learning Question Answering
Code Code Available 05 Modulating early visual processing by language Jul 2, 2017 Question Answering Visual Question Answering
Code Code Available 05 Modeling Relationships in Referential Expressions with Compositional Modular Networks Nov 30, 2016 Visual Question Answering (VQA)
Code Code Available 05 StarVQA: Space-Time Attention for Video Quality Assessment Aug 22, 2021 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Factor Graph Attention Apr 11, 2019 Graph Attention Question Answering
Code Code Available 05 InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition Sep 26, 2023 Articles Image Comprehension
Code Code Available 05 Active Learning for Visual Question Answering: An Empirical Study Nov 6, 2017 Active Learning Visual Question Answering
Code Code Available 05 Modularized Zero-shot VQA with Pre-trained Models May 27, 2023 object-detection Object Detection
Code Code Available 05 Are VLMs Really Blind Oct 29, 2024 Language Modeling Language Modelling
Code Code Available 05 MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering Nov 1, 2021 multimodal interaction Multiple-choice
Code Code Available 05 Exploring the Potential of Encoder-free Architectures in 3D LMMs Feb 13, 2025 Inductive Bias Visual Question Answering (VQA)
Code Code Available 05 Subjective and Objective Quality Assessment of High-Motion Sports Videos at Low-Bitrates Jul 12, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Exploring the Effectiveness of Video Perceptual Representation in Blind Video Quality Assessment Jul 8, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Mimic and Fool: A Task Agnostic Adversarial Attack Jun 11, 2019 Adversarial Attack Image Captioning
Code Code Available 05 MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding Jan 11, 2020 Image Captioning Image-text Retrieval
Code Code Available 05 Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos Sep 21, 2022 Action Detection Action Recognition
Code Code Available 05 Exploring Models and Data for Image Question Answering May 8, 2015 Image Segmentation object-detection
Code Code Available 05 Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm Aug 16, 2024 Decision Making Medical Visual Question Answering
Code Code Available 05 Are Red Roses Red? Evaluating Consistency of Question-Answering Models Jul 1, 2019 Question Answering valid
Code Code Available 05 Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 05 Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering Mar 26, 2024 Decision Making Explainable artificial intelligence
Code Code Available 05 MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models Feb 28, 2025 Decision Making Hallucination
Code Code Available 05 Explainable and Explicit Visual Reasoning over Scene Graphs Dec 5, 2018 Inductive Bias Visual Question Answering (VQA)
Code Code Available 05 Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks Jul 30, 2024 Visual Question Answering (VQA)
Code Code Available 05 CAST: Cross-modal Alignment Similarity Test for Vision Language Models Sep 17, 2024 cross-modal alignment Question Answering
Code Code Available 05 Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Mar 18, 2025 document understanding Question Answering
Code Code Available 05 Cascaded Mutual Modulation for Visual Reasoning Sep 6, 2018 Question Answering Visual Question Answering
Code Code Available 05 ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions Oct 17, 2024 Visual Question Answering (VQA)
Code Code Available 05 CARETS: A Consistency And Robustness Evaluative Test Suite for VQA Mar 15, 2022 Negation Question Generation
Code Code Available 05 MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks Mar 29, 2023 Cross-Modal Retrieval Decoder
Code Code Available 05