| Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference | Sep 25, 2019 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs | Apr 23, 2024 | Question AnsweringRetrieval | —Unverified | 0 |
| WoLF: Wide-scope Large Language Model Framework for CXR Understanding | Mar 19, 2024 | AnatomyInstruction Following | —Unverified | 0 |
| xGQA: Cross-Lingual Visual Question Answering | Oct 16, 2021 | Cross-Lingual TransferLanguage Modeling | —Unverified | 0 |
| Yin and Yang: Balancing and Answering Binary Visual Questions | Nov 16, 2015 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension | Nov 1, 2019 | Caption GenerationQuestion Answering | —Unverified | 0 |
| ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue | Sep 26, 2024 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Zero-shot Action Localization via the Confidence of Large Vision-Language Models | Oct 18, 2024 | Action LocalizationLanguage Modelling | —Unverified | 0 |
| Zero-Shot Anomaly Detection in Battery Thermal Images Using Visual Question Answering with Prior Knowledge | May 22, 2025 | Anomaly DetectionQuestion Answering | —Unverified | 0 |
| Zero-Shot Transfer VQA Dataset | Nov 2, 2018 | Question AnsweringTransfer Learning | —Unverified | 0 |
| Zero-Shot Visual Question Answering | Nov 17, 2016 | Question AnsweringRetrieval | —Unverified | 0 |
| Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis | Aug 27, 2024 | BenchmarkingLarge Language Model | —Unverified | 0 |
| Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning | Oct 12, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey | Nov 26, 2024 | Natural Language UnderstandingQuestion Answering | —Unverified | 0 |
| Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving | May 9, 2025 | Autonomous DrivingBackdoor Attack | —Unverified | 0 |
| Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models | Oct 9, 2023 | HallucinationObject | —Unverified | 0 |
| Neglected Risks: The Disturbing Reality of Children's Images in Datasets and the Urgent Call for Accountability | Apr 20, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| NegVQA: Can Vision Language Models Understand Negation? | May 28, 2025 | NegationQuestion Answering | —Unverified | 0 |
| Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection | Mar 31, 2016 | Caption GenerationClassification | —Unverified | 0 |
| Neural Memory Plasticity for Anomaly Detection | Oct 12, 2019 | Anomaly DetectionEEG | —Unverified | 0 |
| Neural Self Talk: Image Understanding via Continuous Questioning and Answering | Dec 10, 2015 | Question AnsweringQuestion Generation | —Unverified | 0 |
| NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA | Nov 6, 2024 | Federated LearningLanguage Modelling | —Unverified | 0 |
| Neuro-Symbolic Spatio-Temporal Reasoning | Nov 28, 2022 | AI AgentImage Segmentation | —Unverified | 0 |
| Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" | Jun 20, 2020 | Graph GenerationQuestion Answering | —Unverified | 0 |
| Neuro-Symbolic VQA: A review from the perspective of AGI desiderata | Apr 13, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |