Deep Modular Co-Attention Networks for Visual Question Answering Jun 25, 2019 Question Answering Visual Question Answering
Code Code Available 0Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets Oct 12, 2024 Knowledge Distillation Question Answering
Code Code Available 0Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering May 21, 2024 Diversity Information Retrieval
Code Code Available 0Robustness through Data Augmentation Loss Consistency Oct 21, 2021 Multi-domain Dialogue State Tracking Visual Question Answering
Code Code Available 0Answer Questions with Right Image Regions: A Visual Attention Regularization Approach Feb 3, 2021 Question Answering Visual Grounding
Code Code Available 0Recommending Themes for Ad Creative Design via Visual-Linguistic Representations Jan 20, 2020 Question Answering Recommendation Systems
Code Code Available 0D3: Data Diversity Design for Systematic Generalization in Visual Question Answering Sep 15, 2023 Diversity Question Answering
Code Code Available 0Recursive Visual Attention in Visual Dialog Dec 6, 2018 Question Answering Visual Dialog
Code Code Available 0II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering Feb 16, 2024 Question Answering Triplet
Code Code Available 0ReDiT: Re‑evaluating large visual question answering model confidence by defining input scenario Difficulty and applying Temperature mapping Jan 6, 2025 Question Answering Visual Question Answering
Code Code Available 0BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data Oct 1, 2024 Code Generation Logical Reasoning
Code Code Available 0Towards Knowledge-Augmented Visual Question Answering Dec 1, 2020 General Knowledge Graph Attention
Code Code Available 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0Towards Language-guided Visual Recognition via Dynamic Convolutions Oct 17, 2021 Question Answering Referring Expression
Code Code Available 0Answering Questions about Data Visualizations using Efficient Bimodal Fusion Aug 5, 2019 Chart Question Answering Optical Character Recognition
Code Code Available 0Relation-Aware Graph Attention Network for Visual Question Answering Mar 29, 2019 Graph Attention Implicit Relations
Code Code Available 0HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction Jun 25, 2025 Benchmarking Person Identification
Code Code Available 0How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? Sep 3, 2024 In-Context Learning Language Modeling
Code Code Available 0How Modular Should Neural Module Networks Be for Systematic Generalization? Jun 15, 2021 Question Answering Systematic Generalization
Code Code Available 0High-Order Attention Models for Visual Question Answering Nov 12, 2017 Question Answering Visual Question Answering
Code Code Available 0REMIND Your Neural Network to Prevent Catastrophic Forgetting Oct 6, 2019 Quantization Question Answering
Code Code Available 0Hierarchical Deep Multi-modal Network for Medical Visual Question Answering Sep 27, 2020 Descriptive Medical Visual Question Answering
Code Code Available 0Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering Apr 8, 2019 Question Answering Video Question Answering
Code Code Available 0cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation Jun 7, 2022 Knowledge Distillation Question Answering
Code Code Available 0Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics Jan 14, 2025 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA Mar 17, 2021 Question Answering Relational Reasoning
Code Code Available 0Cross-Modal Contrastive Learning for Robust Reasoning in VQA Nov 21, 2022 Contrastive Learning Question Answering
Code Code Available 0Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective Dec 23, 2024 Question Answering Visual Question Answering
Code Code Available 0VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Jul 17, 2025 Language Modeling Language Modelling
Code Code Available 0Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? Oct 17, 2024 All Language Modeling
Code Code Available 0VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers Mar 30, 2022 Question Answering Visual Commonsense Reasoning
Code Code Available 0CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering Nov 7, 2022 Add - PO Add - PQ
Code Code Available 0AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Oct 28, 2024 Benchmarking Question Answering
Code Code Available 0HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language May 28, 2023 Machine Translation Multimodal Machine Translation
Code Code Available 0An Improved Attention for Visual Question Answering Nov 4, 2020 Decoder Question Answering
Code Code Available 0Towards Visual Question Answering on Pathology Images Aug 1, 2021 Decision Making Question Answering
Code Code Available 0REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory Dec 10, 2022 Image Captioning Language Modeling
Code Code Available 0HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models Dec 29, 2024 Hallucination Object
Code Code Available 0Counting Everyday Objects in Everyday Scenes Apr 12, 2016 Object Object Counting
Code Code Available 0MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond Apr 24, 2020 Object Counting Question Answering
Code Code Available 0A Unified Hallucination Mitigation Framework for Large Vision-Language Models Sep 24, 2024 Hallucination Question Answering
Code Code Available 0Revisiting Video Quality Assessment from the Perspective of Generalization Sep 23, 2024 Image Quality Assessment Video Quality Assessment
Code Code Available 0Revisiting Visual Question Answering Baselines Jun 27, 2016 Binary Classification Multiple-choice
Code Code Available 0Hallucination Benchmark in Medical Visual Question Answering Jan 11, 2024 Hallucination Medical Visual Question Answering
Code Code Available 0HalLoc: Token-level Localization of Hallucinations for Vision Language Models Jun 12, 2025 Hallucination Image Captioning
Code Code Available 0Copy-Move Forgery Detection and Question Answering for Remote Sensing Image Dec 3, 2024 Question Answering Visual Question Answering
Code Code Available 0REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering Jul 27, 2020 Question Answering Visual Question Answering
Code Code Available 0Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach Jan 31, 2020 Question Answering Visual Question Answering
Code Code Available 0Right this way: Can VLMs Guide Us to See More to Answer Questions? Nov 1, 2024 Question Answering Visual Question Answering
Code Code Available 0Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning May 1, 2018 Commonsense Causal Reasoning Image Captioning
Code Code Available 0