| MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants | Oct 13, 2021 | intent-classificationIntent Classification | —Unverified | 0 |
| OVQA: A Clinically Generated Visual Question Answering Dataset | Jul 7, 2022 | BenchmarkingMedical Visual Question Answering | —Unverified | 0 |
| OWLViz: An Open-World Benchmark for Visual Question Answering | Mar 4, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| PaLI: A Jointly-Scaled Multilingual Language-Image Model | Sep 14, 2022 | DecoderFew-Shot Image Classification | —Unverified | 0 |
| PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter | Feb 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PAM: Understanding Product Images in Cross Product Category Attribute Extraction | Jun 8, 2021 | AttributeAttribute Extraction | —Unverified | 0 |
| ParsVQA-Caps: A Benchmark for Visual Question Answering and Image Captioning in Persian | Dec 7, 2022 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Patch-level Sounding Object Tracking for Audio-Visual Question Answering | Dec 14, 2024 | Audio-visual Question AnsweringObject Tracking | —Unverified | 0 |
| Pathological Visual Question Answering | Oct 6, 2020 | AI AgentQuestion Answering | —Unverified | 0 |
| PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese | Jul 17, 2023 | Question AnsweringVietnamese Visual Question Answering | —Unverified | 0 |
| PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering | Apr 19, 2024 | ArticlesInformation Retrieval | —Unverified | 0 |
| PDFVQA: A New Dataset for Real-World VQA on PDF Documents | Apr 13, 2023 | document understandingKey Information Extraction | —Unverified | 0 |
| PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models | Mar 16, 2025 | Machine UnlearningPrivacy Preserving | —Unverified | 0 |
| Performance Analysis of Traditional VQA Models Under Limited Computational Resources | Feb 9, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Parameter Efficient Reinforcement Learning from Human Feedback | Mar 15, 2024 | Question Answeringreinforcement-learning | —Unverified | 0 |
| PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly | Jun 10, 2025 | Question AnsweringScene Understanding | —Unverified | 0 |
| Physically Grounded Vision-Language Models for Robotic Manipulation | Sep 5, 2023 | Image CaptioningLanguage Modelling | —Unverified | 0 |
| PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals | Nov 29, 2022 | Deep LearningQuestion Answering | —Unverified | 0 |
| PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Feb 12, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 |
| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| POINTS: Improving Your Vision-language Model with Affordable Strategies | Sep 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Polar-VQA: Visual Question Answering on Remote Sensed Ice sheet Imagery from Polar Region | Mar 13, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models | Jun 14, 2024 | DecoderKnowledge Graphs | —Unverified | 0 |
| Predicting Relative Depth between Objects from Semantic Features | Jan 12, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| PreSTU: Pre-Training for Scene-Text Understanding | Sep 12, 2022 | DecoderImage Captioning | —Unverified | 0 |