Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 05 EaSe: A Diagnostic Tool for VQA based on Answer Diversity Jun 1, 2021 Diagnostic Diversity
Code Code Available 05 Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery Oct 29, 2023 Deep Learning Multimodal Deep Learning
Code Code Available 05 Learning to Count Objects in Natural Images for Visual Question Answering Feb 15, 2018 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 Dynamic Memory Networks for Visual and Textual Question Answering Mar 4, 2016 Question Answering Visual Question Answering
Code Code Available 05 Targeted Visual Prompting for Medical Visual Question Answering Aug 6, 2024 Medical Visual Question Answering Question Answering
Code Code Available 05 Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering Mar 6, 2022 Graph Attention Question Answering
Code Code Available 05 MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks Mar 29, 2023 Cross-Modal Retrieval Decoder
Code Code Available 05 Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision May 6, 2025 Learning-To-Rank Self-Supervised Learning
Code Code Available 05 Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Mar 18, 2025 document understanding Question Answering
Code Code Available 05 DVQA: Understanding Data Visualizations via Question Answering Jan 24, 2018 Articles Chart Question Answering
Code Code Available 05 Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Dec 2, 2016 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base Dec 16, 2023 cross-modal alignment Knowledge Graphs
Code Code Available 05 LXMERT Model Compression for Visual Question Answering Oct 23, 2023 model Model Compression
Code Code Available 05 μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Jul 1, 2024 Cell Detection Classification
Code Code Available 05 DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue Nov 17, 2019 feature selection Question Answering
Code Code Available 05 Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering Apr 22, 2024 Language Modeling Language Modelling
Code Code Available 05 Dual Recurrent Attention Units for Visual Question Answering Feb 1, 2018 Question Answering Visual Question Answering
Code Code Available 05 DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Jun 26, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 05 Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View Oct 30, 2020 Face Recognition image-classification
Code Code Available 05 Lightweight Recurrent Cross-modal Encoder for Video Question Answering Jun 30, 2023 Action Recognition Question Answering
Code Code Available 05 LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering May 29, 2021 Question Answering Visual Question Answering
Code Code Available 05 Logical Implications for Visual Question Answering Consistency Mar 16, 2023 Language Modeling Language Modelling
Code Code Available 05 Dual Attention Networks for Visual Reference Resolution in Visual Dialog Feb 25, 2019 AI Agent Question Answering
Code Code Available 05 Locally Smoothed Neural Networks Nov 22, 2017 Face Verification Question Answering
Code Code Available 05 LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery Feb 26, 2024 Continual Learning Exemplar-Free
Code Code Available 05 Dual Attention Networks for Multimodal Reasoning and Matching Nov 2, 2016 Collaborative Inference Image-text matching
Code Code Available 05 LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models Aug 26, 2024 Large Language Model Video Quality Assessment
Code Code Available 05 Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue Reasoning Jun 9, 2025 Future prediction Question Answering
Code Code Available 05 Mimic and Fool: A Task Agnostic Adversarial Attack Jun 11, 2019 Adversarial Attack Image Captioning
Code Code Available 05 Simple Baseline for Visual Question Answering Dec 7, 2015 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model Jun 10, 2022 Question Answering Task 2
— Unverified 00 LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Mar 25, 2025 Autonomous Navigation Question Answering
— Unverified 00 Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision Apr 20, 2020 counterfactual image-classification
— Unverified 00 Learning Visual Knowledge Memory Networks for Visual Question Answering Jun 13, 2018 Question Answering Visual Question Answering
— Unverified 00 D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions Jul 2, 2024 Diagnostic Instruction Following
— Unverified 00 An Evaluation of Image-Based Verb Prediction Models against Human Eye-Tracking Data Jun 1, 2018 General Classification Question Answering
— Unverified 00 Learning to Specialize with Knowledge Distillation for Visual Question Answering Dec 1, 2018 General Classification General Knowledge
— Unverified 00 Learning to Select Question-Relevant Relations for Visual Question Answering Jun 1, 2021 Graph Attention Question Answering
— Unverified 00 Learning to Recognize the Unseen Visual Predicates Sep 25, 2019 Question Answering Visual Question Answering
— Unverified 00 Neural Reasoning, Fast and Slow, for Video Question Answering Jul 10, 2019 Natural Questions Question Answering
— Unverified 00 DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment Jul 1, 2023 Language Modeling Language Modelling
— Unverified 00 Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios Nov 20, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 00 An Evaluation of GPT-4V and Gemini in Online VQA Dec 17, 2023 Question Answering Visual Question Answering
— Unverified 00 Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs Jun 28, 2021 Question Answering Task 2
— Unverified 00 Learning to Disambiguate by Asking Discriminative Questions Aug 9, 2017 Benchmarking Image Captioning
— Unverified 00 Domain-robust VQA with diverse datasets and methods but no target labels Mar 29, 2021 Domain Adaptation Object Recognition
— Unverified 00 Do Explanations make VQA Models more Predictable to a Human? Oct 29, 2018 Question Answering Visual Question Answering
— Unverified 00 Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering Sep 11, 2024 Question Answering Visual Question Answering
— Unverified 00 Learning to Compose Diversified Prompts for Image Emotion Classification Jan 26, 2022 Classification Emotion Classification
— Unverified 00