Gender and Racial Bias in Visual Question Answering Datasets May 17, 2022 Question Answering Visual Question Answering
— Unverified 0Gemini Pro Defeated by GPT-4V: Evidence from Education Dec 27, 2023 image-classification Image Classification
— Unverified 0COCO is "ALL'' You Need for Visual Instruction Fine-tuning Jan 17, 2024 All Image Captioning
— Unverified 0Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources Nov 22, 2015 Form General Knowledge
— Unverified 0Learning to Disambiguate by Asking Discriminative Questions Aug 9, 2017 Benchmarking Image Captioning
— Unverified 0GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning Jun 22, 2025 Answer Generation Decision Making
— Unverified 0GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis Nov 25, 2024 Medical Visual Question Answering Multiple-choice
— Unverified 0GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance May 25, 2025 Caption Generation Question Answering
— Unverified 0Asking questions on handwritten document collections Oct 2, 2021 Optical Character Recognition (OCR) Question Answering
— Unverified 0Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios Nov 20, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 0CoBIT: A Contrastive Bi-directional Image-Text Generation Model Mar 23, 2023 Decoder Image Generation
— Unverified 0Learning to Recognize the Unseen Visual Predicates Sep 25, 2019 Question Answering Visual Question Answering
— Unverified 0FVQA: Fact-based Visual Question Answering Jun 17, 2016 Common Sense Reasoning Question Answering
— Unverified 0Learning to Specialize with Knowledge Distillation for Visual Question Answering Dec 1, 2018 General Classification General Knowledge
— Unverified 0FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering Mar 19, 2023 Common Sense Reasoning Information Retrieval
— Unverified 0MIMOQA: Multimodal Input Multimodal Output Question Answering Jun 1, 2021 Question Answering Visual Question Answering
— Unverified 0Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering Apr 24, 2024 Language Modeling Language Modelling
— Unverified 0Fusion of Detected Objects in Text for Visual Question Answering Aug 14, 2019 Question Answering Visual Commonsense Reasoning
— Unverified 0LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Mar 25, 2025 Autonomous Navigation Question Answering
— Unverified 0Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment Aug 1, 2023 Diversity Knowledge Distillation
— Unverified 0FunBench: Benchmarking Fundus Reading Skills of MLLMs Mar 2, 2025 Anatomy Benchmarking
— Unverified 0Asking More Informative Questions for Grounded Retrieval Nov 14, 2023 Question Answering Question Selection
— Unverified 0MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis Jul 3, 2024 Position Question Answering
— Unverified 0Full-reference Video Quality Assessment for User Generated Content Transcoding Dec 19, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models Jun 2, 2023 In-Context Learning Language Modeling
— Unverified 0From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation Nov 21, 2023 Explanation Generation Visual Question Answering (VQA)
— Unverified 0MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering Nov 11, 2022 Medical Visual Question Answering Question Answering
— Unverified 0CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense Aug 8, 2019 Question Answering Visual Question Answering (VQA)
— Unverified 0Abduction of Domain Relationships from Data for VQA Feb 13, 2025 Question Answering Visual Question Answering
— Unverified 0From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason Oct 1, 2019 Graph Neural Network Question Answering
— Unverified 0MGA-VQA: Multi-Granularity Alignment for Visual Question Answering Jan 25, 2022 Question Answering Visual Question Answering
— Unverified 0From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering Jun 25, 2022 Question Answering Visual Question Answering
— Unverified 0From Pixels to Objects: Cubic Visual Attention for Visual Question Answering Jun 4, 2022 Object Question Answering
— Unverified 0CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering Mar 1, 2025 Continual Learning Language Modeling
— Unverified 0From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge Jun 10, 2025 Knowledge Graphs Language Modeling
— Unverified 0From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts Nov 30, 2018 Novel Concepts Question Answering
— Unverified 0Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain? Dec 27, 2021 Articles Medical Visual Question Answering
— Unverified 0Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry Nov 17, 2024 Question Answering Scene Understanding
— Unverified 0From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities Nov 1, 2023 Navigate Question Answering
— Unverified 0CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering Jan 2, 2025 Multiple-choice Question Answering
— Unverified 0CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks Jan 15, 2022 Question Answering Visual Commonsense Reasoning
— Unverified 0From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models Jan 1, 2023 Question Answering Visual Question Answering
— Unverified 0From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data May 6, 2022 Question Answering Visual Question Answering
— Unverified 0Memory Augmented Neural Networks for Natural Language Processing Sep 1, 2017 AI Agent Language Modeling
— Unverified 0MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM Jul 16, 2025 Attribute Face Swapping
— Unverified 0Free Form Medical Visual Question Answering in Radiology Jan 23, 2024 Diagnostic Form
— Unverified 0CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment Mar 14, 2022 parameter-efficient fine-tuning Question Answering
— Unverified 0A Short Survey of Systematic Generalization Nov 22, 2022 Survey Systematic Generalization
— Unverified 0FOVQA: Blind Foveated Video Quality Assessment Jun 24, 2021 Video Compression Video Quality Assessment
— Unverified 0A Shared Task on Multimodal Machine Translation and Crosslingual Image Description Aug 1, 2016 Image Description Image Retrieval
— Unverified 0