| Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA | May 31, 2023 | counterfactualCounterfactual Inference | —Unverified | 0 |
| UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation | Mar 19, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models | May 31, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena | Aug 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Variational Disentangled Attention for Regularized Visual Dialog | Sep 29, 2021 | Question AnsweringVisual Dialog | —Unverified | 0 |
| Variational Visual Question Answering | May 14, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VCD: Knowledge Base Guided Visual Commonsense Discovery in Images | Feb 27, 2024 | Decision MakingLanguage Modelling | —Unverified | 0 |
| VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents | Apr 14, 2025 | Question AnsweringRAG | —Unverified | 0 |
| V-Doc : Visual questions answers with Documents | May 27, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| V-Doc: Visual Questions Answers With Documents | Jan 1, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| VGNMN: Video-grounded Neural Module Networks for Video-Grounded Dialogue Systems | Jul 1, 2022 | Information RetrievalQuestion Answering | —Unverified | 0 |
| VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks | Apr 16, 2021 | Information RetrievalQuestion Answering | —Unverified | 0 |
| Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera | May 30, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Video Question Answering via Attribute-Augmented Attention Network Learning | Jul 20, 2017 | AttributeInformation Retrieval | —Unverified | 0 |
| VILA^2: VILA Augmented VILA | Jul 24, 2024 | HallucinationOptical Character Recognition (OCR) | —Unverified | 0 |
| ViLMedic: a framework for research at the intersection of vision and language in medical AI | May 1, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering | Mar 26, 2025 | DiagnosticHallucination | —Unverified | 0 |
| Vision and Language: from Visual Perception to Content Creation | Dec 26, 2019 | DecoderQuestion Answering | —Unverified | 0 |
| Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning | Feb 18, 2024 | HallucinationVisual Question Answering | —Unverified | 0 |
| VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework | Mar 14, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Vision-Language Models as Success Detectors | Mar 13, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Vision Language Models Can Parse Floor Plan Maps | Sep 19, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Vision-Language Models for Edge Networks: A Comprehensive Survey | Feb 11, 2025 | Autonomous VehiclesImage Captioning | —Unverified | 0 |
| Vision-Language Pretraining: Current Trends and the Future | May 1, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck | May 30, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |