Neural Module Networks Nov 9, 2015 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction Jun 25, 2025 Benchmarking Person Identification
Code Code Available 05 Fully Authentic Visual Question Answering Dataset from Online Communities Nov 27, 2023 Question Answering Visual Question Answering
Code Code Available 05 https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 05 Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering Nov 17, 2015 Image Captioning Question Answering
Code Code Available 05 Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms Aug 29, 2018 Community Question Answering General Classification
Code Code Available 05 AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results Apr 24, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 A simple neural network module for relational reasoning Jun 5, 2017 Image Retrieval with Multi-Modal Query Question Answering
Code Code Available 05 Multimodal Residual Learning for Visual QA Jun 5, 2016 Multiple-choice Question Answering
Code Code Available 05 A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models Aug 2, 2017 Question Answering Visual Question Answering
Code Code Available 05 Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering Aug 4, 2017 Question Answering Visual Question Answering
Code Code Available 05 CLIPVQA:Video Quality Assessment via CLIP Jul 6, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering Dec 19, 2024 Contrastive Learning Language Modeling
Code Code Available 05 A Simple Baseline for Knowledge-Based Visual Question Answering Oct 20, 2023 In-Context Learning Question Answering
Code Code Available 05 From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models Dec 21, 2022 Question Answering Visual Question Answering
Code Code Available 05 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Jun 6, 2016 Phrase Grounding Visual Grounding
Code Code Available 05 ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering Jun 6, 2019 Question Answering Video Question Answering
Code Code Available 05 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence Feb 15, 2018 Activity Recognition Explainable Models
Code Code Available 05 Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding Oct 4, 2018 Question Answering Representation Learning
Code Code Available 05 Multi-Image Visual Question Answering Dec 27, 2021 Question Answering Visual Question Answering
Code Code Available 05 II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering Feb 16, 2024 Question Answering Triplet
Code Code Available 05 ILLUME: Rationalizing Vision-Language Models through Human Interactions Aug 17, 2022 Image Captioning Question Answering
Code Code Available 05 FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering May 27, 2025 Benchmarking Question Answering
Code Code Available 05 ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images Feb 9, 2025 Clinical Knowledge Medical Visual Question Answering
Code Code Available 05 MQA: Answering the Question via Robotic Manipulation Mar 10, 2020 Imitation Learning Question Answering
Code Code Available 05 CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions Jan 3, 2019 Diagnostic Image Segmentation
Code Code Available 05 Focal Visual-Text Attention for Visual Question Answering Jun 5, 2018 Memex Question Answering Question Answering
Code Code Available 05 Focal Visual-Text Attention for Memex Question Answering Dec 14, 2018 Memex Question Answering Question Answering
Code Code Available 05 CLEVR\_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images Jun 1, 2021 Question Answering Visual Question Answering
Code Code Available 05 CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images Apr 13, 2021 Question Answering Visual Question Answering
Code Code Available 05 ArtQuest: Countering Hidden Language Biases in ArtVQA Jan 4, 2024 Question Answering Visual Question Answering
Code Code Available 05 Modeling Relationships in Referential Expressions with Compositional Modular Networks Nov 30, 2016 Visual Question Answering (VQA)
Code Code Available 05 HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation May 16, 2025 Benchmarking Ethics
Code Code Available 05 CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning Nov 26, 2018 Acoustic Question Answering Question Answering
Code Code Available 05 Modularized Zero-shot VQA with Pre-trained Models May 27, 2023 object-detection Object Detection
Code Code Available 05 Modulating early visual processing by language Jul 2, 2017 Question Answering Visual Question Answering
Code Code Available 05 Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 05 Revisiting Video Quality Assessment from the Perspective of Generalization Sep 23, 2024 Image Quality Assessment Video Quality Assessment
Code Code Available 05 Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions Nov 20, 2023 Question Answering Visual Question Answering
Code Code Available 05 Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering Apr 3, 2018 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 FigureQA: An Annotated Figure Dataset for Visual Reasoning Oct 19, 2017 BIG-bench Machine Learning Chart Question Answering
Code Code Available 05 Improved RAMEN: Towards Domain Generalization for Visual Question Answering Sep 6, 2021 Domain Generalization Question Answering
Code Code Available 05 Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering May 21, 2015 Question Answering Sentence
Code Code Available 05 Few-Shot Multimodal Explanation for Visual Question Answering Oct 28, 2024 Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
Code Code Available 05 Federated Document Visual Question Answering: A Pilot Study May 10, 2024 Federated Learning Question Answering
Code Code Available 05 MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering Nov 1, 2021 multimodal interaction Multiple-choice
Code Code Available 05 MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding Jan 11, 2020 Image Captioning Image-text Retrieval
Code Code Available 05 Factor Graph Attention Apr 11, 2019 Graph Attention Question Answering
Code Code Available 05 Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm Aug 16, 2024 Decision Making Medical Visual Question Answering
Code Code Available 05 Medical Large Vision Language Models with Multi-Image Visual Ability May 25, 2025 Question Answering Visual Question Answering (VQA)
Code Code Available 05