Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering Jun 29, 2023 Answer Generation Question Answering
Code Code Available 1Are Bias Mitigation Techniques for Deep Learning Effective? Apr 1, 2021 Deep Learning Question Answering
Code Code Available 1Learning to Discretely Compose Reasoning Module Networks for Video Captioning Jul 17, 2020 Decoder Question Answering
Code Code Available 1How to Configure Good In-Context Sequence for Visual Question Answering Dec 4, 2023 In-Context Learning Question Answering
Code Code Available 1End-to-end Document Recognition and Understanding with Dessurt Mar 30, 2022 document understanding Visual Question Answering (VQA)
Code Code Available 1Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework May 22, 2025 Multiple-choice Visual Question Answering (VQA)
Code Code Available 1End-to-end Knowledge Retrieval with Multi-modal Queries Jun 1, 2023 Benchmarking Cross-Modal Retrieval
Code Code Available 1Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images Oct 1, 2021 Question Answering Visual Question Answering
Code Code Available 1Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Jul 25, 2017 Image Captioning Visual Question Answering
Code Code Available 1DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering Jul 10, 2021 Graph Attention Question Answering
Code Code Available 1How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Nov 27, 2023 Adversarial Robustness Visual Question Answering (VQA)
Code Code Available 1IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents Dec 10, 2024 Cross-Modal Retrieval Image Classification
Code Code Available 1HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment Nov 18, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations Dec 8, 2022 Explanation Generation Visual Entailment
Code Code Available 1Hierarchical Conditional Relation Networks for Video Question Answering Feb 25, 2020 Audio-Visual Question Answering (AVQA) Question Answering
Code Code Available 1Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models Mar 12, 2024 Concept Alignment Instruction Following
Code Code Available 1Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training Nov 23, 2023 Multimodal Reasoning Science Question Answering
Code Code Available 1Faithful Multimodal Explanation for Visual Question Answering Sep 8, 2018 Explanatory Visual Question Answering Question Answering
Code Code Available 1Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? Feb 23, 2023 Open-Domain Question Answering Question Answering
Code Code Available 1MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Dec 6, 2024 Multimodal Reasoning Visual Question Answering
Code Code Available 1Dual-Key Multimodal Backdoors for Visual Question Answering Dec 14, 2021 Question Answering Visual Question Answering
Code Code Available 1Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Apr 12, 2024 Image Captioning Question Answering
Code Code Available 1MapQA: A Dataset for Question Answering on Choropleth Maps Nov 15, 2022 Articles Question Answering
Code Code Available 1MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration Oct 6, 2024 Medical Visual Question Answering Question Answering
Code Code Available 1Hierarchical multimodal transformers for Multi-Page DocVQA Dec 7, 2022 Decoder Question Answering
Code Code Available 1OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge May 31, 2019 object-detection Object Detection
Code Code Available 1FAVER: Blind Quality Prediction of Variable Frame Rate Videos Jan 5, 2022 Cloud Computing Video Quality Assessment
Code Code Available 1ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding Oct 12, 2022 document-image-classification Document Image Classification
Code Code Available 1HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles Dec 18, 2023 Question Answering Visual Question Answering
Code Code Available 1MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models Mar 23, 2023 Auxiliary Learning Multimodal Sentiment Analysis
Code Code Available 1MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Mar 17, 2025 Articles Benchmarking
Code Code Available 1Mimic In-Context Learning for Multimodal Tasks Apr 11, 2025 In-Context Learning Visual Question Answering (VQA)
Code Code Available 1Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion Feb 26, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1HallE-Control: Controlling Object Hallucination in Large Multimodal Models Oct 3, 2023 Attribute Decoder
Code Code Available 1MISS: A Generative Pretraining and Finetuning Approach for Med-VQA Jan 10, 2024 Medical Visual Question Answering Multi-Task Learning
Code Code Available 1Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering Sep 19, 2024 Hallucination Hallucination Evaluation
Code Code Available 1Hierarchical Question-Image Co-Attention for Visual Question Answering May 31, 2016 Visual Dialog Visual Question Answering
Code Code Available 1LaPA: Latent Prompt Assist Model For Medical Visual Question Answering Apr 19, 2024 Medical Visual Question Answering Question Answering
Code Code Available 1MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts May 18, 2023 Medical Visual Question Answering Question Answering
Code Code Available 1MMBERT: Multimodal BERT Pretraining for Improved Medical VQA Apr 3, 2021 Language Modeling Language Modelling
Code Code Available 1An Evaluation of Image-Based Verb Prediction Models against Human Eye-Tracking Data Jun 1, 2018 General Classification Question Answering
— Unverified 0Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs Jun 28, 2021 Question Answering Task 2
— Unverified 0D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions Jul 2, 2024 Diagnostic Instruction Following
— Unverified 0Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports May 22, 2025 Answer Generation Question Answering
— Unverified 0DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment Jul 1, 2023 Language Modeling Language Modelling
— Unverified 0An Evaluation of GPT-4V and Gemini in Online VQA Dec 17, 2023 Question Answering Visual Question Answering
— Unverified 0Grounding Answers for Visual Questions Asked by Visually Impaired People Jun 20, 2022 Question Answering Visual Question Answering
— Unverified 0Grounding Complex Navigational Instructions Using Scene Graphs Jun 3, 2021 Question Answering reinforcement-learning
— Unverified 0Domain-robust VQA with diverse datasets and methods but no target labels Mar 29, 2021 Domain Adaptation Object Recognition
— Unverified 0Do Explanations make VQA Models more Predictable to a Human? Oct 29, 2018 Question Answering Visual Question Answering
— Unverified 0