| Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA | May 31, 2023 | counterfactualCounterfactual Inference | —Unverified | 0 |
| UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation | Mar 19, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models | May 31, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena | Aug 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Variational Disentangled Attention for Regularized Visual Dialog | Sep 29, 2021 | Question AnsweringVisual Dialog | —Unverified | 0 |
| Variational Visual Question Answering | May 14, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VCD: Knowledge Base Guided Visual Commonsense Discovery in Images | Feb 27, 2024 | Decision MakingLanguage Modelling | —Unverified | 0 |
| VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents | Apr 14, 2025 | Question AnsweringRAG | —Unverified | 0 |
| V-Doc : Visual questions answers with Documents | May 27, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| V-Doc: Visual Questions Answers With Documents | Jan 1, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| VGNMN: Video-grounded Neural Module Networks for Video-Grounded Dialogue Systems | Jul 1, 2022 | Information RetrievalQuestion Answering | —Unverified | 0 |
| VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks | Apr 16, 2021 | Information RetrievalQuestion Answering | —Unverified | 0 |
| Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera | May 30, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Video Question Answering via Attribute-Augmented Attention Network Learning | Jul 20, 2017 | AttributeInformation Retrieval | —Unverified | 0 |
| VILA^2: VILA Augmented VILA | Jul 24, 2024 | HallucinationOptical Character Recognition (OCR) | —Unverified | 0 |
| ViLMedic: a framework for research at the intersection of vision and language in medical AI | May 1, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering | Mar 26, 2025 | DiagnosticHallucination | —Unverified | 0 |
| Vision and Language: from Visual Perception to Content Creation | Dec 26, 2019 | DecoderQuestion Answering | —Unverified | 0 |
| Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning | Feb 18, 2024 | HallucinationVisual Question Answering | —Unverified | 0 |
| VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework | Mar 14, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Vision-Language Models as Success Detectors | Mar 13, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Vision Language Models Can Parse Floor Plan Maps | Sep 19, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Vision-Language Models for Edge Networks: A Comprehensive Survey | Feb 11, 2025 | Autonomous VehiclesImage Captioning | —Unverified | 0 |
| Vision-Language Pretraining: Current Trends and the Future | May 1, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck | May 30, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Vision-to-Language Tasks Based on Attributes and Attention Mechanism | May 29, 2019 | Image CaptioningQuestion Answering | —Unverified | 0 |
| VISREAS: Complex Visual Reasoning with Unanswerable Questions | Feb 23, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning | Sep 10, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Visual7W: Grounded Question Answering in Images | Nov 11, 2015 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |
| Visual Commonsense based Heterogeneous Graph Contrastive Learning | Nov 11, 2023 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Visual Entailment: A Novel Task for Fine-Grained Image Understanding | Jan 20, 2019 | Natural Language InferenceQuestion Answering | —Unverified | 0 |
| Visual Entailment Task for Visually-Grounded Language Learning | Nov 26, 2018 | Grounded language learningNatural Language Inference | —Unverified | 0 |
| Visual Explanations from Hadamard Product in Multimodal Deep Networks | Dec 18, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Visual Graph Question Answering with ASP and LLMs for Language Parsing | Feb 13, 2025 | Graph Question AnsweringOptical Character Recognition | —Unverified | 0 |
| Visual Grounding Strategies for Text-Only Natural Language Processing | Mar 25, 2021 | Image RetrievalLanguage Modeling | —Unverified | 0 |
| Visual Hallucination: Definition, Quantification, and Prescriptive Remediations | Mar 26, 2024 | HallucinationImage Captioning | —Unverified | 0 |
| Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem | Jul 24, 2022 | DiagnosticQuestion Answering | —Unverified | 0 |
| Visual Question Answering as a Meta Learning Task | Nov 22, 2017 | Meta-LearningQuestion Answering | —Unverified | 0 |
| Visual Question Answering as a Multi-Task Problem | Jul 3, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Visual Question Answering as Reading Comprehension | Nov 29, 2018 | Common Sense ReasoningGeneral Knowledge | —Unverified | 0 |
| Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature | May 18, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Visual question answering based evaluation metrics for text-to-image generation | Nov 15, 2024 | Image GenerationImage Manipulation | —Unverified | 0 |
| Visual Question Answering based on Formal Logic | Nov 8, 2021 | Formal LogicQuestion Answering | —Unverified | 0 |
| Visual Question Answering based on Local-Scene-Aware Referring Expression Generation | Jan 22, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps | Aug 1, 2018 | Cross-Lingual TransferImage Captioning | —Unverified | 0 |
| Visual Question Answering for Cultural Heritage | Mar 22, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Visual question answering: from early developments to recent advances -- a survey | Jan 7, 2025 | DescriptiveNatural Language Understanding | —Unverified | 0 |
| Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective | Oct 22, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck | Jun 25, 2023 | object-detectionObject Detection | —Unverified | 0 |
| Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | Feb 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |