MUREL: Multimodal Relational Reasoning for Visual Question Answering Feb 25, 2019 Relational Reasoning Visual Question Answering
Code Code Available 05 MUTAN: Multimodal Tucker Fusion for Visual Question Answering May 18, 2017 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes Sep 6, 2024 Multiple-choice Question Answering
Code Code Available 05 Multi-Sourced Compositional Generalization in Visual Question Answering May 29, 2025 Question Answering Visual Question Answering
Code Code Available 05 Multi-Target Embodied Question Answering Apr 9, 2019 Embodied Question Answering Navigate
Code Code Available 05 Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism Apr 29, 2024 document understanding GPU
Code Code Available 05 Adapting Lightweight Vision Language Models for Radiological Visual Question Answering Jun 17, 2025 Diagnostic Question Answering
Code Code Available 05 Open-Ended Visual Question-Answering Oct 9, 2016 Question Answering Sentence
Code Code Available 05 Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering Sep 23, 2020 Question Answering Visual Question Answering
Code Code Available 05 Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model Jan 12, 2024 Language Modeling Language Modelling
Code Code Available 05 General Greedy De-bias Learning Dec 20, 2021 image-classification Image Classification
Code Code Available 05 Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory Jul 4, 2021 Question Answering Scene Understanding
Code Code Available 05 HalLoc: Token-level Localization of Hallucinations for Vision Language Models Jun 12, 2025 Hallucination Image Captioning
Code Code Available 05 Hallucination Benchmark in Medical Visual Question Answering Jan 11, 2024 Hallucination Medical Visual Question Answering
Code Code Available 05 Multimodal Residual Learning for Visual QA Jun 5, 2016 Multiple-choice Question Answering
Code Code Available 05 Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling Feb 20, 2025 Decoder GPU
Code Code Available 05 NAAQA: A Neural Architecture for Acoustic Question Answering Jun 11, 2021 Acoustic Question Answering Question Answering
Code Code Available 05 Ask Your Neurons: A Deep Learning Approach to Visual Question Answering May 9, 2016 Question Answering Visual Question Answering
Code Code Available 05 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Jun 6, 2016 Phrase Grounding Visual Grounding
Code Code Available 05 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence Feb 15, 2018 Activity Recognition Explainable Models
Code Code Available 05 GAMIVAL: Video Quality Prediction on Mobile Cloud Gaming Content May 3, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering Aug 4, 2017 Question Answering Visual Question Answering
Code Code Available 05 Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering Dec 19, 2024 Contrastive Learning Language Modeling
Code Code Available 05 Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing Jan 29, 2018 Question Answering Visual Question Answering
Code Code Available 05 FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment Apr 12, 2025 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Co-attending Regions and Detections with Multi-modal Multiplicative Embedding for VQA Nov 18, 2017 Form Question Answering
Code Code Available 05 Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering Nov 18, 2017 Form Visual Question Answering
Code Code Available 05 Hierarchical Deep Multi-modal Network for Medical Visual Question Answering Sep 27, 2020 Descriptive Medical Visual Question Answering
Code Code Available 05 A Joint Sequence Fusion Model for Video Question Answering and Retrieval Aug 7, 2018 Decoder Multiple-choice
Code Code Available 05 Multi-Image Visual Question Answering Dec 27, 2021 Question Answering Visual Question Answering
Code Code Available 05 Fully Authentic Visual Question Answering Dataset from Online Communities Nov 27, 2023 Question Answering Visual Question Answering
Code Code Available 05 Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering Nov 17, 2015 Image Captioning Question Answering
Code Code Available 05 MQA: Answering the Question via Robotic Manipulation Mar 10, 2020 Imitation Learning Question Answering
Code Code Available 05 Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms Aug 29, 2018 Community Question Answering General Classification
Code Code Available 05 AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results Apr 24, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 A simple neural network module for relational reasoning Jun 5, 2017 Image Retrieval with Multi-Modal Query Question Answering
Code Code Available 05 A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models Aug 2, 2017 Question Answering Visual Question Answering
Code Code Available 05 Modulating early visual processing by language Jul 2, 2017 Question Answering Visual Question Answering
Code Code Available 05 CLIPVQA:Video Quality Assessment via CLIP Jul 6, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 A Simple Baseline for Knowledge-Based Visual Question Answering Oct 20, 2023 In-Context Learning Question Answering
Code Code Available 05 From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models Dec 21, 2022 Question Answering Visual Question Answering
Code Code Available 05 Modularized Zero-shot VQA with Pre-trained Models May 27, 2023 object-detection Object Detection
Code Code Available 05 ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering Jun 6, 2019 Question Answering Video Question Answering
Code Code Available 05 Modeling Relationships in Referential Expressions with Compositional Modular Networks Nov 30, 2016 Visual Question Answering (VQA)
Code Code Available 05 FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering May 27, 2025 Benchmarking Question Answering
Code Code Available 05 ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images Feb 9, 2025 Clinical Knowledge Medical Visual Question Answering
Code Code Available 05 A Dataset and Architecture for Visual Reasoning with a Working Memory Mar 16, 2018 Diagnostic Logical Reasoning
Code Code Available 05 How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? Sep 3, 2024 In-Context Learning Language Modeling
Code Code Available 05 MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering Nov 1, 2021 multimodal interaction Multiple-choice
Code Code Available 05 CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions Jan 3, 2019 Diagnostic Image Segmentation
Code Code Available 05