VQA with Cascade of Self- and Co-Attention Blocks Feb 28, 2023 Question Answering Visual Question Answering
— Unverified 0VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images May 6, 2024 Attribute Language Modeling
— Unverified 0Watching the News: Towards VideoQA Models that can Read Nov 10, 2022 Question Answering Video Question Answering
— Unverified 0Weakly Supervised Visual Question Answer Generation Jun 11, 2023 Answer Generation Dependency Parsing
— Unverified 0Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks Dec 6, 2019 Image Retrieval Inductive Bias
— Unverified 0Webly Supervised Concept Expansion for General Purpose Vision Models Feb 4, 2022 Human-Object Interaction Detection Image Retrieval
— Unverified 0What is needed for simple spatial language capabilities in VQA? Aug 17, 2019 Diagnostic Question Answering
— Unverified 0What Large Language Models Bring to Text-rich VQA? Nov 13, 2023 Image Comprehension Optical Character Recognition (OCR)
— Unverified 0What makes a good metric? Evaluating automatic metrics for text-to-image consistency Dec 18, 2024 Sensitivity Visual Question Answering (VQA)
— Unverified 0When are Lemons Purple? The Concept Association Bias of Vision-Language Models Dec 22, 2022 Attribute image-classification
— Unverified 0Where is this coming from? Making groundedness count in the evaluation of Document VQA models Mar 24, 2025 Question Answering Visual Question Answering
— Unverified 0Where To Look: Focus Regions for Visual Question Answering Nov 23, 2015 Question Answering Visual Question Answering
— Unverified 0Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering Oct 23, 2024 Federated Learning Medical Visual Question Answering
— Unverified 0Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities Oct 2, 2024 Question Answering Visual Question Answering
— Unverified 0Why Does a Visual Question Have Different Answers? Aug 12, 2019 Question Answering Visual Question Answering
— Unverified 0Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference Sep 25, 2019 Common Sense Reasoning Question Answering
— Unverified 0WoLF: Wide-scope Large Language Model Framework for CXR Understanding Mar 19, 2024 Anatomy Instruction Following
— Unverified 0Workshop on Document Intelligence Understanding Jul 31, 2023 document understanding Visual Question Answering (VQA)
— Unverified 0WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image Dec 3, 2024 Diagnostic Language Modeling
— Unverified 0WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models Mar 22, 2022 Image Captioning Image Generation
— Unverified 0XGPT: Cross-modal Generative Pre-Training for Image Captioning Mar 3, 2020 Data Augmentation Denoising
— Unverified 0xGQA: Cross-Lingual Visual Question Answering Oct 16, 2021 Cross-Lingual Transfer Language Modeling
— Unverified 0Yin and Yang: Balancing and Answering Binary Visual Questions Nov 16, 2015 Question Answering Visual Question Answering
— Unverified 0YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension Nov 1, 2019 Caption Generation Question Answering
— Unverified 0ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue Sep 26, 2024 Medical Visual Question Answering Question Answering
— Unverified 0Zero-Shot Anomaly Detection in Battery Thermal Images Using Visual Question Answering with Prior Knowledge May 22, 2025 Anomaly Detection Question Answering
— Unverified 0Zero-Shot Transfer VQA Dataset Nov 2, 2018 Question Answering Transfer Learning
— Unverified 0Zero-Shot Video Question Answering with Procedural Programs Dec 1, 2023 Code Generation Language Modeling
— Unverified 0Zero-Shot Visual Question Answering Nov 17, 2016 Question Answering Retrieval
— Unverified 0Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis Aug 27, 2024 Benchmarking Large Language Model
— Unverified 0Multimodal Learning and Reasoning for Visual Question Answering Dec 1, 2017 Question Answering Representation Learning
— Unverified 0Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering Dec 23, 2018 Cross-Modal Information Retrieval Information Retrieval
— Unverified 0Multimodal Machine Learning: Integrating Language, Vision and Speech Jul 1, 2017 Audio-Visual Speech Recognition BIG-bench Machine Learning
— Unverified 0Multimodal Neural Graph Memory Networks for Visual Question Answering Jul 1, 2020 Graph Neural Network Question Answering
— Unverified 0Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data Jul 1, 2018 Image Description Machine Translation
— Unverified 0Multimodal Reranking for Knowledge-Intensive Visual Question Answering Jul 17, 2024 Answer Generation Question Answering
— Unverified 0Multi-Modal Retrieval Augmentation for Open-Ended and Knowledge-Intensive Video Question Answering Feb 17, 2025 Multiple-choice Question Answering
— Unverified 0Multimodal Unified Attention Networks for Vision-and-Language Interactions Aug 12, 2019 Question Answering Visual Grounding
— Unverified 0Multiple-Question Multiple-Answer Text-VQA Nov 15, 2023 Decoder Denoising
— Unverified 0Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification Dec 28, 2023 Attribute cross-modal alignment
— Unverified 0Multi-task Learning of Hierarchical Vision-Language Representation Dec 3, 2018 Multi-Task Learning Question Answering
— Unverified 0MUST-VQA: MUltilingual Scene-text VQA Sep 14, 2022 Question Answering Visual Question Answering
— Unverified 0MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering Jul 7, 2021 Medical Visual Question Answering Missing Labels
— Unverified 0NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Oct 18, 2024 Attribute Question Answering
— Unverified 0Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey Nov 26, 2024 Natural Language Understanding Question Answering
— Unverified 0Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving May 9, 2025 Autonomous Driving Backdoor Attack
— Unverified 0Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models Oct 9, 2023 Hallucination Object
— Unverified 0NegVQA: Can Vision Language Models Understand Negation? May 28, 2025 Negation Question Answering
— Unverified 0Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection Mar 31, 2016 Caption Generation Classification
— Unverified 0Neural Memory Plasticity for Anomaly Detection Oct 12, 2019 Anomaly Detection EEG
— Unverified 0